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Preface 



Preface 


This volume presents the proceedings of ALife 2016, the Fifteenth International 
Conference on the Synthesis and Simulation of Living Systems , held July 4th-8th. It 
took place in Latin America for the first time in Cancun, Mexico (http://xva.life). 


1 The ALife 2016 Program 

We received in total 174 submissions, out of which 68 were accepted as oral presenta¬ 
tions and 41 were accepted as poster presentations, both of which are included in these 
proceedings. We accepted 21 late breaking abstracts for poster presentations and are 
available at the conference website http://xva.life. 

The conference program of this year included: 

• Eight keynote presentations of internationally renowned speakers within a wide 
variety of topics: 

- Randall Beer (Indiana University), 

- Mark Bickhard (Lehigh University), 

- Ezequiel Di Paolo (Ikerbasque), 

- Jorge M. Pacheco (Universidade do Minho, Portugal), 

- Alexandra Penn (University of Surrey), 

- Ken Rinaldo (The Ohio State University), 

- Francisco C. Santos (Instituto Superior Tecnico, University of Lisbon, Por¬ 
tugal), and 

- Linda Smith (Indiana University, Bloomington). 

• Parallel sessions on: 

- ALife and society 

- Origins of life and protocells 

- Self-organization and automation 

- Robotics 

- Genetics 

- Open-ended evolution 



- Morphology 

- Evolvability 

- Cooperation and collective behavior 

- Development 

- Learning and Memory 

- Ecology 

- Artificial societies 

- Language and culture evolution 

- Computational biology 

- Artificial chemistries 

- Living technology 

- Human-Computer interaction 

- Theory and measures 

Eight workshops: 

- SCBCS: Synthesizing Concepts from Biology and Computer Science 
(Organizers: Emily Dolson and Charles Ofria). 

- SLCS: Steering Living and Life-like Complex Systems 
(Organizers: Alexandra Penn, Rob Mills, and Emma Hart). 

- SLACE: Social Learning and Cultural Evolution 

(Organizers: Chris Marriot, Peter Andras, James Borg, and Paulo Smaldino). 

- BFE: The Biological Foundations of Enactivism (Organizers: Eran Agmon, 
Nathaniel Virgo, Tom Froese, and Matthew Egbert). 

- OEE: Open Ended Evolution: Recent Progress 
(Organizers: Mark Bedau, Alastair Channon, and Tim Taylor). 

- GSO: Eight International Workshop on Guided Self-Organization 
(Organizers: Mikhail Prokopenko, Carlos Gershenson, and Daniel Polani). 

- MEW: Morphogenetic Engineering Workshop 
(Organizers: Rene Doursat and Hiroki Sayama). 

- EGT: Multidisciplinary applications of evolutionary game theory 
(Organizers: Tom Lenaerts, Luis A. Martinez-Vaquero, Jelena Grujic, Fran¬ 
cisco C. Santos, and Tom Froese. 

Five tutorials: 

- MABE: An Introduction to MABE (Modular Agent Based Evolution) and 
Markov Network Brains. 

- AVIDA-ED: Avida-ED, a tool for teaching a classroom research. 

- AEVOL: In silico experimental evolution with the Aevol Software. 



- ISA: Introductory Statistics for ALife Experiments: A Visual Approach. 

- NetLogo: A low threshold/high ceiling for programming multi-agent mod¬ 
els. 

• The third ISAL Summer School. 

• Art exhibition 

(Ken Rinaldo (The Ohio State University), Tatsuo Unemi (Soka University), 
Daniel Bisig (University of Zurich), Mario Garcia-Valdez (Instituto Tecnolgico 
de Tijuana), Eduardo Makoszay Mayen, and Antonio Isaac Gomez). 


2 About the Editors 

Carlos Gershenson is a tenured research professor at the Instituto de Investigaciones 
en Matematicas Aplicadas y en Sistemas of the Universidad Nacional Autonoma de 
Mexico, where he leads the Self-organizing Systems Lab. His research interests include 
complex systems, self-organization, urbanism, philosophy, and artificial life. 

Tom Froese is Associate Professor at the Research Institute for Applied Mathemat¬ 
ics and Systems of the National Autonomous University of Mexico, Mexico City. He 
is also affiliated with the Center for the Sciences of Complexity at the same univer¬ 
sity. His research interests include evolutionary robotics, origins of life, and cognitive 
science. 

J. Mario Siqueiros is Associate Professor at the Instituto de Investigaciones en 
Matematicas Aplicadas y en Sistemas of the Universidad Nacional Autonoma de Mexico. 
His research interests include complex systems, computational social science, emer¬ 
gence and evolution of culture, social-ecological systems, and philosophy of biology. 

Wendy Aguilar works as a Research Assistant at the Department of Computer Sci¬ 
ence, Intituto de Investigaciones en Matematicas Aplicadas y en Sistemas at the Uni¬ 
versidad Nacional Autonoma de Mexico. She received the PhD., M.Sc., and B.Sc. 
degrees in Computer Science from the Universidad Nacional Autonoma de Mexico 
(UNAM). Her research areas include developmental artificial intelligence and compu¬ 
tational creativity. 

Eduardo Izquierdo is an Assistant Professor in the Cognitive Science Program at 
Indiana University. He is also affiliated to the Program in Neuroscience, School of In¬ 
formatics and Computing, the Indiana University Network Science Institute, the Center 
for Complex Networks and Systems Research, and the Center for the Integrative Study 
of Animal Behavior. His research is focused on the understanding of behavior from 
brain-body-environment interactions in simple model organisms. 

Hiroki Sayama is Director of the Center for Collective Dynamics of Complex Sys¬ 
tems and Associate Professor of Systems Science and Industrial Engineering, at Bing¬ 
hamton University, State University of New York, USA. His research areas include 
complex systems, dynamical networks, artificial chemistry, computational social sci¬ 
ence, and interactive evolutionary computation. 



3 About the Conference Logo 

In recognition of the Maya, this year’s ALife conference logo includes a Mayan hi¬ 
eroglyph for soul-heart (Figure 1), which is closely related with life. It is read o’hlis , 
and refers to one of the essential animating souls or forces that is specific to the hu¬ 
man species with its emotions, knowledge, and thought (Erik Velasquez Garcia, pers. 
comm.). This force is also an embodiment of the spirit of the god that was principally 
responsible for the creation of humanity, the god of maize, since the Maya believed 
that our flesh was originally made from ground corn. The hieroglyph is surrounded by 
motifs related to the artificial forces of the contemporary world: mechanical and digi¬ 
tal technologies. The three bars below ALife are the number fifteen in Mayan numerals 
(one bar represents the number 5), indicating the fifteenth edition of the conference. 



Figure 1: The conference logo of ALife XV. 
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Introduction 


Tom Froese, J. Mario Siqueiros, Wendy Aguilar, 
Eduardo J. Izquierdo, Hiroki Sayama, and Carlos Gershenson 


Cancun is located in what is known as the Mayan Riviera. In the popular imagina¬ 
tion of our contemporary world, the Maya are best known as a lost civilization whose 
ruins are scattered throughout the tropical jungles of Mexico and Central America. Yet, 
as Coe [3] (p. 11) reminds us, the Maya are hardly a vanished people: their population 
numbers over 7 million people, which makes them the largest single block of indige¬ 
nous people north of Peru. Already at the time of the Spanish conquest, the Maya were 
found in an area that included all of the Yucatan Peninsula (including, of course, the 
area of Cancun) and parts of the states of Tabasco and Chiapas in Mexico, as well as 
Guatemala, Belize, and the western portion of Honduras and El Salvador. 

Ideas related to the creation of life have a long history in the Old World, reaching 
back to the times of classical antiquity, and are expressed in a diverse range of intersect¬ 
ing fields jointly known as artificial life today [1]. It has been argued that the modern 
sciences of the artificial could learn something from these classical myths [6]. What is 
less well known is that ancient Mesoamerican cultures also possessed a rich corpus of 
myths related to what we would now refer to as artificial life. In other words, although 
this is the first time that an ALife conference takes place in Mesoamerica, in a sense 
the central concept of the field, i.e. the creation of life via artificial means, has already 
been around the region for centuries if not for over a millennium. In the following we 
will focus on the specific case of the Maya, from whom several myths related to this 
concept have survived. 

According the Mayan creation account, as recorded by the Quiche Maya during 
colonial times in the 16th century book Popul Vuh [10] 1 , all was empty in the begin¬ 
ning, only murmurs and ripples in an endless sea under a dark night sky. The creator 
gods convened to bring about the dawn of the world with all of its geological and bi¬ 
ological features. But this world was unable to appreciate the grandeur of their efforts 
and the animals lacked language to express themselves properly. So the gods decided to 
create humans. A first human design made from mud failed mainly because of its unre¬ 
liable material composition. The mud man was dissolving and only talked senselessly 
2 . So the gods dismantled it. 

The gods therefore tried another human design, this time based on wood. They 
created “manikins, woodcarvings, human in looks and human in speech. [•••]• They 

'Page numbers of all the quotations of the Mayan myths refer to this book. 

2 As noted by Tedlock (1996), this reference to a single man made from mud might be an allusion to the 
Biblical myth of Gods creation of Adam from the dust of the ground, turning this part of the story into an 
indirect resistance to the colonial doctrine. For the writers of the Popol Vuh, “a singular creature of mud 
could neither have made sense, nor walked nor multiplied” (p. 231). 
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came into being, they multiplied, they had daughters, they had sons, these manikins, 
woodcarvings” (p. 70). Nevertheless, although this design was already better, these 
replicating manikins had another unexpected defect, because “there was nothing in 
their hearts and nothing in their minds, no memory of their mason and builder” (p. 70). 
Although these wooden people could multiply, talk, and move around, their bodies 
were still too dry and deformed and they lacked an appreciation of their own existence, 
and “and so they fell, just an experiment and just a cutout for humankind” (ibid.). 

In revenge the animal species they had been eating and the tools they had been 
using began to speak themselves and turned against their former masters: “Their faces 
were crushed by things of wood and stone. Everything spoke: their water jars, their 
tortilla griddles, their plates, their cooking pots, their dogs, their grinding stones, each 
and every thing crushed their faces” (p. 72). What is most interesting from the per¬ 
spective of artificial life is that their household utensils, and even the houses, became 
enlivened and took issue with the people’s fall from grace. Here, for example, is the 
lament of their grinding stones: 


”We were undone because of you. Every day, every day, in the dark, in the 
dawn, forever, r-r-rip, r-r-rip, r-r-rub, r-r-rub, right in our faces, because of 
you. This is the service we gave you at first, when you were still people, 
but today you will learn of our power. We shall pound and we shall grind 
your flesh,” their grinding stones told them. (p. 72) 


As their utensils rose up against them and their houses collapsed, these wooden hu¬ 
mans with their crushed faces scattered in all directions and took refuge in the forests. 
Today we can still see this previous version of the human form: they are what we call 
monkeys. Interestingly, this shows that the Maya had no problem conceiving of non¬ 
human primates as precursors to humans, an idea which only took hold in the Western 
imagination following the development of evolutionary theory, in particular after the 
publication of Darwin’s controversial book The Descent of Man in 1871. 

According to Tedlock, the moral of this story is that we disenchant nature and over¬ 
rely on technology at our peril. This ethical concern becomes even more pressing when 
we consider the contemporary push toward using the tools and insights of artificial life 
for the creation of so-called living technology [2]. For if we were indeed able to create 
the conditions for genuinely autonomous, automatically self-reproducing, and open- 
endedly evolving examples of technology, then how could we guarantee that this living 
technology will remain favorably inclined towards humans rather than turn out to be 
just self-interested or even confrontational [4] (pp. 549-551). After all, autonomy is 
the logical opposite of controllability. Such a vision is still far from being realized 
in practice. Nevertheless, this failed creation story is a timely reminder that we must 
balance our efforts to improve technology with a healthy dose of humility and caution. 
We do not want to end up creating tools akin to the Mayan’s disgruntled grinding stones 
that resent their users. 

At this point the narrators of the Popul Vuh temporally leave the problem of the 
origins of modern humans aside in order to relate some myths about the origins of 
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various personified celestial movements and constellations. One of the myths is of 
special interest for us because it involves the construction of an artificial life form, 
which is explicitly conceptualized as being an artificial replica of organic life. In this 
myth the Four Hundred Boys, patron deities of alcoholic intoxication, scheme to kill a 
self-aggrandizing crocodilian earth monster, Zipacna, but end up being killed by him 
instead (and thus serving as a representation of the setting of the Pleiades). The Hero 
Twins, Hunahpu and Xbalanque, want to revenge their death and so they set a trap for 
Zipacna: 

It’s mere fish and crabs that Zipacna looks for in the waters, but he’s eat¬ 
ing every day, going around looking for his food by day and lifting up 
mountains by night. Next comes the counterfeiting of a great crab by 
Hunahpu and Xbalanque. And they used bromeliad flowers, picked from 
the bromeliads of the forest. These became the forearms of the crab, and 
where they opened were the claws. They used a flagstone for the back of 
the crab, which clattered. After that they put the shell beneath an overhang, 
at the foot of a great mountain, (p. 84) 

Then the Hero Twins talk with the hungry Zipacna and let him know that they have 
seen a crab that he could eat. They guide him to the bottom of the canyon: 

The crab is on her side, her shell is gleaming red there. In under the canyon 
wall is their contrivance. “Very good!” Zipacna is happy now. He wishes 
she were already in his mouth, so she could really cure his hunger. He 
wanted to eat her, he just wanted it face down, he wanted to enter, but 
since the crab got on top of him with her back down, he came back out. 

“You didn’t reach her?” he was asked. “No indeed she was just getting on 
top with her back down. I just barely missed her on the first try, so perhaps 
I’d better enter on my back,” he replied. After that he entered again, on his 
back. He entered all the way only his kneecaps were showing now! He 
gave a last sigh and was calm. The great mountain rested on his chest. He 
couldn’t turn over now, and so Zipacna turned to stone, (p. 85) 3 

In this way Zipacna was “defeated by genius alone” (p. 85), that is, by an artificial 
crab that could make sounds and move around. In other words, although the imagined 
technology, consisting of flowers and stones, is clearly unrealistic, here we find the 
very idea of artificial life as such (see illustration in Figure 1). This myth is special 
because, in contrast, the rebelling utensils were animated by a divine wrath rather than 
by artifice, and the mud man and the wooden manikin were supposed to be actual living 
beings instead of imitations of living beings. However, the idea of the counterfeit crab, 
in an abstract sense, is no different to how we understand the concept of artificial life 
from a modern perspective: it was an artificial imitation of life rather than natural life 
itself. 

3 According to Tedlock (1996), Zipacna’s struggle to wrestle his body into the right position to consum¬ 
mate his hunger becomes a symbolic parody of sexual intercourse (p. 35). This is confirmed by one of his 
Maya informants, who notes with amusement the gender reversed roles in that finally the man ends up on his 
back. This interpretation is also supported by the fact that in Mopan Maya the term for crab, yux, is used as 
a metaphor for vulva. 
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Figure 1: The Hero Twins make an imitation crab. Figure taken from [8] (p. 214); 
downloaded from the public domain. 


Unfortunately, it is difficult to date the origins of this mythical crab to a specific 
time period of the ancient Maya, but references to the adventures of the Hero Twins are 
known from Late Formative times (around 300 BC) onwards [7](p. 134). For example, 
it has been argued that proto-Classic period stele from Izapa, a Mayan site in Chiapas, 
Mexico, depict early representations of the Hero Twins’ defeat of Zipacna’s father, the 
monster bird Vucub Caquix (ibid., p. 182). Future research could try to determine more 
precisely when the myth of the artificial crab first arose, and how it compares with the 
more familiar myths from European antiquity, especially from ancient Greece (Mayor, 
2016). 

After recounting a number of additional myths centered on the exploits of the Hero 
Twins and other gods, the narrators of the Popol Vuh are ready to return to the story of 
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the origins of human beings. This time the gods are prepared: 


And here is the beginning of the conception of humans, and of the search 
for the ingredients of the human body. So they spoke, the Bearer, Begetter, 
the Makers, Modelers named Sovereign Plumed Serpent: “The dawn has 
approached, preparations have been made, and morning has come for the 
provider, nurturer, bom in the light, begotten in the light. Morning has 
come for humankind, for the people of the face of the earth,” they said. It 
all came together as they went on thinking in the darkness, in the night, as 
they searched and they sifted, they thought and they wondered. And here 
their thoughts came out in clear light. They sought and discovered what 
was needed for human flesh (p. 145). 

This secret ingredient of the living body turns out to be nothing other than drink 
and food, especially corn. What follows is a Mayan version of the saying that you are 
what you eat: 

And this was when they found the staple foods. And these were the ingre¬ 
dients for the flesh of the human work, the human design, and the water 
was for the blood. It became human blood, and corn was also used by 
the Bearer, Begetter. [• • •] And then the yellow corn and white com were 
ground, and Xmucane did the grinding nine times. Food was used, along 
with the water she rinsed her hands with, for the creation of grease; it be¬ 
came human fat when it was worked by the Bearer, Begetter, Sovereign 
Plumed Serpent, as they are called. After that, they put it into words: the 
making, the modeling of our first mother-father 4 , with yellow corn, white 
com alone for the flesh, food alone for the human legs and arms, for our 
first fathers, the four human works. It was staples alone that made up their 
flesh (p. 146). 

Thus, in the end the creator gods had realized that constructing human beings was 
best done by using existing organic compounds that lend themselves to be formed 
appropriately and which can at the same time sustain the resulting living beings. Here 
we have, in a nutshell, a mythological explanation of the fact that as humans we must 
eat and drink. It is what sustains our existence. 

Interestingly, although the Maya seem to have distinguished between creating sub¬ 
stance (the making) and creating form (the modeling), we can see throughout these 
repeated trial and errors of creation a concern for their interdependence. The human 
form cannot be artificially imposed on just any kind of substrate; on the contrary, its 
embodiment requires very specific material conditions. After having found these con¬ 
ditions the gods celebrated their ultimate success, all the while emphasizing that the 
origins of the human species are an achievement of divine engineering: the first human 
beings were not born but made. 

4 The term “mother-father” is not intended to imply androgynous or dual sex. In fact, the first four persons 
that were created were all male. Mother-father is an expression that is used to refer to their role as lineage 
heads. 
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They were simply made and modeled, it is said; they had no mother and 
no father. We have named the men by themselves. No woman gave birth 
to them, nor were they begotten by the builder, sculptor, Bearer, Begetter. 

By sacrifice alone, by genius alone they were made, they were modeled 
by the Maker, Modeler, Bearer, Begetter, Sovereign Plumed Serpent. And 
when they came to fruition, they came out human (p. 146). 

The genius of the creator gods, i.e. to sacrifice existing life and to process and 
refashion its matter and form so as to give rise to new life, is reminiscent of a dominant 
approach in synthetic biology, which also famously boasted to have created a new form 
of life, in this case by assembling a new cell by reusing existing cells in combination 
with synthesized components [5]. The challenge now facing synthetic biology is to go 
a step further and, like these Mayan gods, create new life without basing it on existing 
individuals, but only on the processing of more basic organic compounds (i.e. to use 
nothing but “food alone”). This feat will require genius indeed. 

There is a final caveat to this success story: it turns out that this time the creator 
gods have overshot their target and accidentally created super-humans. These first hu¬ 
mans saw and knew all there was to know about the world and did so without any 
movement or effort. The gods were worried because these flawless individuals looked 
poised to become godlike themselves, thereby defeating the whole point of their cre¬ 
ation. Accordingly, they decided to remove some of their knowledge and to diminish 
their sight, so that the humans would continue to worship while also being concerned 
with more practical earthly matters. 

“Aren’t they merely ’works’ and ’designs’ in their very names? Yet they’ll 
become as great as gods, unless they procreate, proliferate at the sowing, 
the dawning, unless they increase. Let it be this way: now we’ll take them 
apart just a little, that’s what we need. [• • •]” And such was the loss of the 
means of understanding, along with the means of knowing everything, by 
the four humans. The root was implanted (p. 148). 

There are ways to overcome these inbuilt restrictions, as the humans eventually 
discover. And the Popol Vuh itself, as a sacred book of council, turns out to be one of 
them. We can make sense of this by considering that the same technology that enables 
literacy at the same time enables accumulation of knowledge, and it therefore allows 
humans to progressively better understand and to see more clearly. It is still no different 
in today’s world: now, in addition to all kinds of information technology, we also have 
at our disposal a range of technologies that enable us to overcome the limitations of 
our senses. Yet, as Tedlock [10](p. 60) reminds us, we should be careful to leave space 
for mystery in our lives. The last humans that stopped wondering about the meaning 
of existence and became overly absorbed in exploiting nature and technology lost the 
essence of their humanity. They are still with us today: they are swinging through the 
trees. 

On the other hand, the Maya creation myths do not view that which is artificial as 
necessarily dehumanizing; to the contrary, the creation and use of technology is shown 
to be part of our essential nature. In agreement with ancient and modem traditions 
in the philosophy of technology [9], the narrators of the Popol Vuh highlight that the 



artificial is constitutive of our very being. After all, the first humans were made, not 
bom. What is not clear, and what the narrators leave for their audience to reflect upon, 
is how we can ensure that people are empowered rather than overpowered by their 
unavoidable use of knowledge and technology. The difficulty of realizing this ambi¬ 
tion is revealed by the end of the Maya civilization, which had largely collapsed even 
before the arrival of the Spanish conquerors, a catastrophe that is likely to have been 
precipitated by too much environmental degradation around the major cities. We must 
do our best to prevent humanity from repeating the same mistakes again, a possibility 
that looks increasingly likely and which, though now greatly enhanced in scale, would 
unfortunately be consistent with the Maya’s cyclical view of temporality. 

With this unresolved challenge in mind, we dedicate this year’s installment of the 
ALife conference series to the theme “Artificial life and society”. 
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Autopoiesis and Enaction in the Game of Life 


Randall D. Beer 1 

1 Cognitive Science Program, Indiana University 
rdbeer @ indiana. edu 


Over 40 years ago, the Chilean biologists Humberto Maturana and Francisco Varela 
put forward the notion of autopoiesis as a way to understand living systems and their 
phenomenology. Varela and others subsequently extended this framework to an enac- 
tive approach that places biological autonomy at the foundation of situated and embod¬ 
ied behavior and cognition. In this talk, I will describe an attempt to place these ideas 
on a firmer foundation by studying them within the context of a toy model universe, 
John Conway’s Game of Life (GoL) cellular automata. The talk has both pedagog¬ 
ical and theoretical goals. Simple concrete models provide an excellent vehicle for 
introducing some of the core concepts of autopoiesis and enaction and explaining how 
these concepts fit together into a broader whole. In addition, a careful analysis of such 
toy models can hone our intuitions about these concepts, probe their strengths and 
weaknesses, and move the entire enterprise in the direction of a more mathematically 
rigorous theory. In particular, I will identify the primitive processes that can occur in 
GoL, show how these can be linked together into mutually-supporting networks, map 
the responses of such entities to environmental perturbations, and investigate the paths 
of mutual perturbation that these entities and their environments can undergo. Some of 
the topics that can be examined in GoL include the structure/organization distinction, 
organizational/operational closure, self-production, self-individuation, destructive vs. 
nondestructive perturbations, precariousness, cognitive domain, subjectivity, signifi¬ 
cance, sense-making, structural coupling, and enaction. I will end with some comments 
on the limitations of the GoL model and directions for future work. 
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Gilbert Simondon and the enactive conception 

of life and mind 


Ezequiel Di Paolo 1 

1 Ikerbasque, Basque Foundation for Science 
ezequiel.dipaolo@ehu.es 


The work of French philosopher Gilbert Simondon is seeing a vigorous rediscovery. 
His ideas have a richlargely untappedpotential for science, e.g., in origins of life stud¬ 
ies, developmental psychology, embodied cognition, and artificial life. I summarise 
some key concepts of Simondons philosophy side-by-side with ideas in enactivism, an 
approach to life and mind based on the works of Francisco Varela, Hans Jonas, and 
Maurice Merleau-Ponty. I hope to show that there is much overlap between the two 
approaches, which is good, but also many productive complementarities, and some 
tensions, which is better. 

Simondon encourages enactivism by making its implications more explicit. He ad¬ 
vocates the abandonment of hylomorphic metaphysics (the conceptual separability of 
form and matter) for an ontology of restless and open-ended materiality, relationality, 
and virtuality. According to him, being and becoming are mutually co-defined. The 
subject, in her ongoing individuation, sustains inherently meaningful relations with her 
world. Physical, biological, mental, and social processes of individuation nicely com¬ 
plement the different kinds of precarious autonomy and sense-making elaborated by 
enactive theory, concepts that in turn are only implicit in Simondons work. 
Individuation involves the organization that happens in a milieu capable of abundant 
potentialities when a process of concrete transduction occurs from more to less meta¬ 
stable states (crystallization is one example). Organisms are processes of individuation 
prevented from finishing through regulated engagements with the world in search of 
new sources of potentiality. This coheres with the enactive concept of life as the reg¬ 
ulation of the tensions between self-production and self-distinction. Life and mind, 
for Simondon, entail the neotenic expansion of the early stages of individuation such 
that its termination is temporarily and progressively delayed. This makes explicit the 
material conditions of autonomy and introduces new elements for enactivism such as 
the notion of pre-individual criticality as inherent in the living body. 

Simondons recurrent use of the term information may entail some tensions with enac¬ 
tivism, although his notion is subtle and different from the (hylomorphic) information 
processing metaphor of biological or cognitive functionalism. 

I conclude with reflections on the relevance of Simondons philosophy of technology 
for artificial life, in particular the implication that any life-like artificial system must be 
materially embodied and embedded in concrete, open-ended relations with the world. 
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Artificial Life and Society: Philosophies and 
Tools for Experiencing, Interacting with and 
Managing Real World Complex Adaptive 

Systems 

Alexandra Penn 1 

1 Evolution and Resilience of Industrial Ecosystems Project (ERIE), 
Centre for the Evaluation of Complexity across the Nexus (CECAN), 
Centre for Research in Social Simulation (CRESS), 

Dept, of Sociology, University of Surrey, UK 
a.penn@surrey.ac.uk 


Many of the grand challenges that society faces are concerned with understanding, 
managing and indeed creating complex living, lifelike or hybrid systems at multiple 
scales. Conventional approaches are often unsuccessful in dealing with these com¬ 
plex adaptive systems, which require management tools that interact with dynamic, 
self-organising processes facing perturbation and change, rather than with inert arte¬ 
facts. By using interactive steering strategies which exploit CASs dynamics and self¬ 
organisation, we can attempt to manoeuvre systems to more preferable, stable states 
and update our interventions as they adapt. 

This however, is not sufficient for real world problem solving. In systems with 
human involvement, key drivers are often social, political or economic. Possible inter¬ 
ventions are limited, goals are subjective and participatory or political processes must 
be integrated. Not only do we need innovative ways to manage our systems which 
embrace their complexity, we need broadly-accepted narratives of systems as complex 
and adaptive to help us to shape policy and management. We must be prepared to take 
action with incomplete knowledge and require tools and methodologies for steering, 
monitoring and learning, allowing us to adapt as systems respond to intervention. All 
in complex adaptive systems which we for the most part experience and intuit rather 
than measure. 

New paradigms are urgently required and our community can play a key role. Ar¬ 
tificial Life offers tools and philosophical approaches well-matched to the nature of 
these systems and can provide important perspectives on how to progress. I will give 
an overview of the potential contribution that I believe that Alife can make and the 
need to connect productively with many different disciplines. In particular how ALifes 
technologies and approaches, combined with its inherent creativity, focus on synthetic 
methods and philosophy with a screwdriver ethos, provide what I believe is the perfect 
basis for engaging with real world complex systems. 
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Why development matters to (artificial) life 
Lessons from human babies 


Linda Smith 1 

^ept. of Psychological and Brain Sciences, Indiana University 
smith4 @ indiana.edu 


Why do living forms develop? Development, like evolution and culture, is a process 
that creates complexity by accumulating change. At any moment, the developing agent 
is a product of all previous developments, and any new change begins with and must 
build on those previous developments. Biological systems that are flexibly smart have 
relatively long periods of immaturity. Why is this? This talk will consider answers to 
this question using evidence from the first two years of life of human infants. The core 
ideas are that an adaptive system that can succeed in varied and novel contexts is slow 
does not settle too fast; develops new mechanisms of change and learning processes 
over the life time; develops in a series of different environments. 
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Cognition and the Brain 


Mark Bickhard 1 

department of Philosophy, Lehigh University 
mhbO @ lehigh.edu 


Standard semantic information models are, arguably, conceptually incoherent and 
factually false about the brain. But, nevertheless, they constitute the primary frame¬ 
works for modeling cognitive processes, including in the brain. If such models are ul¬ 
timately not viable, what sort of framework could model cognition in the brain? I will 
argue that an action based approach, in the general lineage of pragmatism, provides 
an alternative modeling framework. In this approach, anticipatory processes are nec¬ 
essary as part of the evolutionary solution to (inter-)action selection, and these yield 
emergent truth value possibilities of being true or false and thus ground cognition 
and representation in general. Such an action framework requires timing, thus oscilla¬ 
tory/modulatory processes, and this is in fact what we find as constituting functional 
processes in the brain. I will outline a micro-scale level of this model and, if time 
permits, a bit of a macro-scale level. This model has some superficial similarities to 
predictive brain models, but also fundamental and crucial differences. 
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In symbio biopoiesis as model of evolved Alife 
(400 PPM Microbiome) 


Ken Rinaldo 1 
1 Department of Art, 
Art & Technology 
The Ohio State University 
rinaldo. 2 @ o su. edu 


Artificial life techniques are illustrative at exploring the wisdom of natural living 
systems. Genetic algorithms, cellular automatons are computativily complex and vi¬ 
sually seductive with Alife in silico. Robotic artists, creating Alife installations expe¬ 
rience design challenges more akin to works created in vivo. Robotic works function 
both in silico and in vivo, with the virtual spaces of computer code and unpredictable 
environment of the real world. With robotic Alife installations, things like; will the 
interactant test the system with muscle or try to defeat the code, are design challenges 
forcing evolution. 

Interactive artists have pioneered behavior based works conceived with the current 
understanding of living systems, such as bottom up emergent behaviors, subsumption 
architectures (Autopoiesis, Fusiform Polyphony), parallel processing (Paparazzi Bots), 
distributed intelligences and energy autonomy (400 PPM Microbiome & Autotelematic 
Spider Bots). 

Still, in silco / in vivo works within the Alife are islands of artifice. True living 
systems offer symbiotic convolutions with overlapping living systems at all scales. 
This is a function of their organic nature. Mitochondria and symbiogenisis are excellent 
examples. 

In order for artificial life to further evolve and emerge artists and scientists, will 
need to create larger symbiotically intertwined systems. Systems moving beyond in 
silco and in vivo, to in symbio. Prototypical living systems will need to find and collect 
their own energy sources from both living and non living systems. They will find 
symbiotic intwining through organic interfaces to complex social systems (Augmented 
Fish Reality) and to bacterial cultures (Enteric Consciousness). Alife research that 
pioneers natural organic breakdown with bacterial cultures and sustainable practices 
such as aquaponics offer clear examples (The Farm Fountain). 

In Symbio intertwinings of natural and inorganic electro-mechanical elements will 
be an important and very natural confluence and co-evolution that is necessary between 
living and co-evolving technological cultures, in the future of Alife. 
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A long-standing and central problem in Physics is to understand how collective be¬ 
havior results from a given two- or N- body fundamental interaction. Similarly, in a 
society, a central problem is to understand the link between individual social behavior 
and emergent collective phenomena (vaccination, epidemics, crowd behavior, diffusion 
of innovations, global governance, etc). 

Here I address this problem by letting individuals engage in pair-wise interactions by 
means of a well- defined social dilemma (a prisoner’s dilemma of cooperation). These 
individuals are embedded in a social network that is both complex and adaptive. Adap¬ 
tation here allows individuals to manifest preferences and resolve conflicts of interest, 
reshaping the network accordingly. Exact Monte-Carlo simulations reveal the inade¬ 
quacy of any of the tools developed to date to predict the co-evolutionary dynamics 
of the population at large. I will present and discuss in detail an adaptive-network- 
sensitive observable that is capable of predicting the collective, population-wide dy¬ 
namics, given prior knowledge of the fundamental rules that govern the social inter¬ 
action between 2 individuals in a social network. In this fundamental step towards 
linking individual behavior with population wide dynamics, I show that adaptive social 
networks act to change the “collective” game, from a 2-person game to a N-person 
game exhibiting a radically different co- evolutionary dynamics, associated with a con¬ 
comitant fundamental transformation of the nature of the associated Nash equilibria. 
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Climate Change Governance, Cooperation and 
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When attempting to avoid global warming, individuals often face a social dilemma 
in which, besides securing future benefits, it is also necessary to reduce the chances 
of future losses. Unfortunately, individuals, regions or nations may opt to be free rid¬ 
ers, hoping to benefit from the efforts of others while choosing not to make any effort 
themselves. Moreover, nations and their leaders seek a collective goal that is shadowed 
by the uncertainty of its achievement. Such types of uncertainties have repeatedly hap¬ 
pened throughout human history from group hunting to voluntary adoption of public 
health measures and other prospective choices. In this talk, I will discuss a popula¬ 
tion dynamics approach to a broad class of cooperation problems in which attempting 
to minimize future losses turns the risk of failure into a central issue in individual 
decisions. Our results suggest that global coordination for a common good should be 
attempted by segmenting tasks in many small to medium sized groups in which percep¬ 
tion of risk is high. Moreover, whenever the perception of risk is low as it is presently 
the case we find that a polycentric approach involving multiple institutions is more 
effective than that associated with a single, global one, indicating that a bottom-up ap¬ 
proach, setup at a local scale, provides a better ground on which to attempt a solution 
for such a complex and global dilemma. Finally, I will discuss the impact on public 
goods dilemmas of uncertainty in collective goals, heterogeneous political networks, 
obstinate players and wealth inequality, including a distribution of wealth representa¬ 
tive of existing inequalities among nations. 
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Do Endothelial Cells Dream of Eclectic Shape? 
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Endothelial cells (ECs), which line our blood vessels, exhibit dramatic plasticity 
and diversity of form/behavior at the individual and collective cell level. They re¬ 
organize themselves in space and time to extend new blood vessel networks during 
development and during a huge array of diseases including cancer. Here we will de¬ 
scribe, using examples from our integrated in silico/in vitro/in vivo research program, 
how the Artificial Life (ALife) perspective and approaches have been paramount in 
driving entirely new experimental biology understanding of the vasculature by capital¬ 
izing on the emergent, predictive capacity and testable nature of agent-based models in 
close combination with in vitro and in vivo experiments. 

Our agent-based simulations explicitly consider the role of individual EC embod¬ 
iment, active perception, heterogeneous vs homogeneous collective dynamics, pattern 
formation and counter-intuitive emergence from feedback in “controller” networks and 
many more Alife centric concepts. We recently identified in silico that the time it takes 
ECs to collectively decide who should move and who should stay during blood vessel 
branching morphogenesis can be varied by altering tissue environment conditions, in¬ 
cluding some changes found in tumors. By proceeding to validate these predictions in 
vitro and in vivo by integrating the studies in the wetlab we have been able to provide 
a solid new mechanism to explain the diversity of vascular network structures found 
across tissues and the malformations arising in disease. 

There is a bright future with untapped potential for the Alife community to further 
contribute to understanding of animals, including humans, at the cell and tissue level, 
where many organizational principles of the systems behavior are still lacking. If we 
take care to be rigorous in how we calibrate our models to biological data and make 
clear experimentally testable predictions, we will show we can make real change in a 
experimental cell biology field, traditionally segregated from in silico research. Learn¬ 
ing from the plight of the insightful, but ostracized, Androids in Philip K Dicks novel, 
overcoming our cultural differences and integrating better between the artificial and 
natural living systems research communities could lead to huge advantages in achiev¬ 
ing our common goals to “understand life as it is”. 
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Artificial Life and Society: Philosophies and Tools for Experiencing, Interacting 
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Abstract 

Many of the grand challenges that society faces are concerned 
with understanding, managing and indeed creating complex 
living, lifelike or hybrid systems at multiple scales. 
Conventional approaches have often been unsuccessful in 
dealing with the inherent non-linearity, adaptability and self- 
organised behaviours of these systems. In fact the underlying 
technologies often transform the involved organizations and 
society as a whole. New paradigms are clearly required and 
the ALife community can play a key role. Artificial Life offers 
both tools and philosophical approaches which are well 
matched to the nature of these systems and can provide 
important perspectives on how to move forward. 

Many of the most important challenges which now face 
our societies involve the management of interlinked 
complex adaptive systems (CAS): coupled socio¬ 
economic and ecological systems composed of many 
interacting elements which have been created or 
partially created by human actions. As we explicitly 
wish to manage and transform these systems, 
engineering and design approaches have much to offer 
us, however they must be fundamentally modified to 
deal with CAS (see e.g. Penn et al ., 2010, Frei and Di 
Marzo Serugendo, 2012). These systems are not static 
artifacts, but dynamic, evolving and reflexive processes 
the behaviour of which is not straightforwardly 
predicable and which may respond in unexpected ways 
in response to our interventions. Additionally many of 
the complex systems which we would most like to 
influence have significant social components. Objective 
choices about design goals cannot be made and the 
integration of participatory or political processes may be 
required. In parallel, new technologies that exploit or 
emulate the unique properties of living systems from the 
cellular to the digital realm have great potential, but 
create new engineering challenges and social dilemmas 
which must be addressed before they can become 
broadly utilized. 

Conventional approaches to working with both living 
and life-like complex adaptive systems are, for the most 
part, “brute force”, attempting to effect control in an 
input- and effort-intensive manner and are often 


insufficient when dealing with their inherent non¬ 
linearity and complexity. Much human management of 
complex adaptive systems/living systems has involved 
simplifying their dynamics or functions via large inputs 
of energy or work. Ecosystems, for example, are 
commonly constrained and managed for a minimum set 
of ecosystem functions, e.g. arable agricultural systems, 
fisheries and flood risk management in wetlands. Doing 
so may involve forcing systems into states that are far 
from natural equilibria and hence inherently unstable 
and requiring significant energy to maintain. System 
management of this type is ineffective in low 
energy/resource regimes and may be vulnerable to 
sudden state change. It offers an illusion of 
predictability and control, but is vulnerable when 
external drivers change (Deffuant, and Gilbert, 2011). 
Evidently, by their very nature, CASs are dynamic, 
adaptive and resilient and require management tools that 
interact with dynamic processes rather than inert 
artefacts. Particularly when as now, systems face 
increased perturbation and change. By using “steering” 
strategies which recognize their dynamical nature, we 
can attempt to manipulate these systems to move 
between attractors, allowing them to remain stable in a 
preferable state without significant energy needed to 
retain it. 

However, a dynamical systems perspective is not 
sufficient for real world problem solving. On the ground 
in complex adaptive systems with human components or 
influence, key drivers are often social, political or 
economic (Gilbert and Bullock 2014). Our range of 
possible interventions is limited. Not only do we need 
different ways to manage our systems which embrace 
their complexity, we need broadly accepted narratives of 
systems as complex and adaptive to help us to shape 
policy and management and we must be prepared to 
take action without full system understanding. We must 
accept, explicitly recognize and be able to make 
decisions with incomplete knowledge, and require tools 
and methodologies for steering, monitoring and 
gathering knowledge which will allow us to adapt as 
systems respond to our intervention in complex adaptive 
systems which we for the most part experience and 
intuit rather than measure (Rowley et al. 1997, Kay et 
al 1999, Waltner-Toews and Kay, 2005) 
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Artificial Life provides unique perspectives, tools and 
philosophies, which can offer both technical and 
methodological approaches to understanding and 
intervening in complex systems subject to “wicked” 
problems. And with our creation of new living, life-like 
and intelligent technologies, it plays a part in 
constructing the complex systems of the future (See 
e.g.Braha et al ., 2006, Bedau et al ., 2010). The time is 
ripe therefore, for the Artificial Life community to 
collate ideas on our technologies and approaches and 
their possible societal contributions and impact. 

I will give an overview of the potential contribution 
that I believe that Alife can make and the need to 
connect productively with many different disciplines. In 
particular how ALife’s focus on interaction, 
embodiment, dynamics, enactivism and 
phenomenological approaches, combined with its 
inherent creativity, focus on synthetic methods and 
“philosophy with a screwdriver” provide what I believe 
is the perfect basis for engaging with real world 
complex systems. 

To illustrate these connections I will discuss 
numerous examples. Including some from my own work 
in developing participatory complexity science tools for 
use by policy makers and system stakeholders in both 
regional industrial economies (“industrial ecosystems”) 
and water catchment management in the UK (Penn et 
al , 2013, 2014, 2016). In particular, using a “steering 
complex adaptive systems” approach; a continuous 
process involving interacting with, monitoring and 
learning from the system in question which combines 
tools from complexity science and participatory 
methods with whole systems design philosophy and 
adaptive management (Penn, Forthcoming). In this 
particular context I will describe some of the 
experiences and challenges of combining mathematical 
modelling and analysis with participatory work in the 
context of rapidly changing real world systems and how 
an Alife perspective has informed the work. I will 
further discuss the potential role of experiential complex 
systems and ask what an interactive “natural history” 
approach to complex adaptive systems can offer in 
combination with mathematical and computational 
tools. 
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Abstract 

Policy-relevant scientific models are typically expected to 
make empirically valid predictions about policy-relevant 
problems. What are the consequences of shaping our science- 
policy interface in this way? Here, it is argued that the theo¬ 
retically insecure simulation modelling pioneered within arti¬ 
ficial life is emblematic of an important alternative approach 
with significance for policy-relevant modelling. 

The 21st century brings a raft of significant systemic chal¬ 
lenges: global finance, global climate, global technology, 
global security, global governance, global sustainability, etc. 
Meeting these challenges will involve understanding and 
managing complex systems that comprise many interacting 
parts (Gilbert and Bullock, 2014). Computer simulation is 
emerging as the key scientific tool for dealing with such sys¬ 
tems (e.g., Farmer and Foley, 2009). As a consequence, sim¬ 
ulation science is increasingly political and has the poten¬ 
tial to be critical to our future well-being and quality of life 
(e.g., Edwards, 2010). To what extent do our current simu¬ 
lation modelling practices measure up to the responsibility 
that they must now shoulder? 

The assumed gold standard for simulation modelling is 
that our models should achieve a (pseudo)-empirical status 
(Peck, 2004) that allows their results to be understood as se¬ 
cure forecasts, or valid predictions about their real-world tar¬ 
get systems (but see Epstein, 2008). Where simulations have 
real-world political significance, it is expected that their pre¬ 
dictions and forecasts can inform the decisions of policy 
makers, stakeholders, etc. Such models are analogous to the 
simulated wind tunnels within which new designs of cars or 
planes are trialled before deciding whether they should be 
constructed and sold in the real world (e.g., Bullock, 2011). 

Why would we want a science-policy interface structured 
in this way: one in which the flow from policy to science de¬ 
fines “challenges” and “impact” while the flow in the other 
direction takes the form of predictions and forecasts? Here, 
the merits of an alternative science-policy interface will be 
considered; one in which simulation models that do not rely 
upon empirical validity are built and explored in order to 
generate insights into our understanding of policy-relevant 


target systems (Di Paolo et al., 2000). 

I argue that, in the same way that Drosophila, 
melanogaster and C. elegans have played a useful role 
within biology as model organisms, Artificial Life has the 
potential to be a model discipline within academia, epito¬ 
mising the use of simulation to deliver insights into what 
may he as opposed to predictions of what will be. 

There are several points in favour of this position. First, 
the field is explicitly located at the boundary between the 
artificial and the real (Silverman and Bullock, 2004); sec¬ 
ond, the research community has explicitly debated the role 
of simulation and its epistemological status (e.g., Wheeler 
et al., 2002); and third, crucially, it is generally accepted that 
most artificial life models lack theoretical security in the fol¬ 
lowing sense. Theoretically secure models are underwritten 
by mature and consensually agreed upon theory (e.g., the 
Navier-Stokes equations for fluid flow). Insecure models do 
not benefit from such a mature theoretical underpinning, but 
rather, are exploratory attempts to generate or progress such 
theory. Perhaps uniquely, the field of artificial life delib¬ 
erately courts insecurity of this kind, by tackling counter- 
factual questions (“life-as-it-could-be”) for which there is, 
almost by definition, little theoretical basis. Similarly, the 
related field of complexity science can be regarded as an 
effort to address questions concerning emergent phenomena 
that lie outside the security provided by reductionist science. 

Research into the policy-relevant systemic questions with 
which this paper opened currently tends to be deeply inse¬ 
cure in the sense described above. It may not always be 
this way since systemic problems are not necessarily inher¬ 
ently insecure. Indeed, it is to be hoped that progress on the 
problems of climate change, financial stability, etc., might 
provide more secure foundations for policy making in the 
near future. However, presently, it is not only more facts 
and data pertaining to these problems that is required but 
more and better theory with which to marshall and make 
sense of these data. Consequently, it is important to learn 
lessons from modelling and simulation efforts that may ap¬ 
pear “worse” in terms of empirical validity, but may be “bet¬ 
ter” in terms of generating the insights and understandings 
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that fuel new and better theory. 

In outlining this style of simulation, I will draw on an 
analysis of Charles Babbage’s simulation model of miracles, 
the first example of a simulation model (Bullock, 2000), and 
its subsequent impact on attempts to build machines capable 
of automatic economic reasoning (Bullock, 2008), and anal¬ 
ysis of Levins’ position on the trade-offs inherent in mod¬ 
elling living systems (Levins, 1966; Bullock, 2014). 

As a concrete example of this type of model, consider 
Schelling’s (1971) well-known model of urban segregation, 
which demonstrates that an initially well-mixed population 
of city dwellers of various types, each with a tendency to 
relocate at random unless they are adjacent to at least a few 
similar individuals, tends over time to become strongly spa¬ 
tially segregated by type. The insight here is that real-world 
segregated populations may not be made up of individu¬ 
als with explicit preferences for segregation; the “micromo¬ 
tives” of the agents may not be reflected straightforwardly in 
their aggregate, population-level patterns. Schelling’s model 
is not empirically valid, makes no forecasts, and is not in¬ 
tended to reresent a specific city or population. Nevertheless 
it delivers an insight that is compelling, transparent and, con¬ 
sequently, capable of helping to shape better social policy. 

Conversely, one might imagine many more sophisticated 
and more data-driven models of, say, Chicago’s economy 
(including representations of its housing market, land prices, 
geography, etc.) and population (including accurate demo¬ 
graphic information, and analogues of relevant psychologi¬ 
cal and social processes, etc.). Such models might be able 
to achieve a degree of empirical validity in the sense of be¬ 
ing able to “hind-cast” Chicago’s historical patterns of seg¬ 
regation to some degree of accuracy. Nevertheless, despite 
the ready ability of such models to generate specific fore¬ 
casts and point predictions with the potential to strongly de¬ 
termine associated policy decisions, they would (given our 
relatively poor understanding of the fundamental systems 
involved) be both insecure and opaque, and, consequently, 
problematic as policy tools. 

The groundwork outlined above enables us to engage with 
three key questions: Might a re-examination of simulation 
modelling enable a science-policy interface to be restruc¬ 
tured in order to allow insights to pass across it? To what 
extent are accountability and democracy impacted by the 
current character of the science-policy interface? Could 
epistemically insecure simulation models of the kind pio¬ 
neered within artificial life research offer a paradigm within 
which an effective reassessment of impact-driven science 
takes place? To which the answers argued for are, respec¬ 
tively: “yes”, “badly”, and “perhaps”. 

Acknowledgments: Thanks to audiences at RMIT, Mel¬ 
bourne and “Knowing and Understanding Through Com¬ 
puter Simulations”, Paris, for useful feedback. 
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Abstract 

Artificial Life is concerned with understanding the dynamics of 
human societies. A defining feature of any human society is its 
institutions. However, defining exactly what an institution is 
has proven difficult, with authors often talking past each other. 
This paper presents a dynamic model of institutions, which 
views institutions as political game forms that generate the 
rules of a group’s economic interactions. Unlike much prior 
work, the framework presented in this paper allows for the 
construction of explicit models of the evolution of institutional 
rules. It takes account of the fact that group members are likely 
to try to actively create institutional rules that benefit 
themselves at the expense of others. The paper finishes with an 
explicit example of how a model of the evolution of 
institutional rewards and punishment for promoting cooperation 
can be created. It is intended that this framework will allow 
Artificial Life researchers to address how human groups can 
create conditions that support cooperation. This will help to 
both provide a better understanding of historical human social 
evolution, and help in understanding the resolution of pressing 
public goods problems such as climate change. 

Introduction 

Artificial Life is concerned with the simulation and synthesis 
of living systems. One key type of living system that Artificial 
Life seeks to understand through simulation and synthesis is 
human social organization. The goals behind this are many 
and varied, from wanting to better understand the ecological 
and social pressures that historically transformed human 
groups from egalitarian hunter-gatherers to hierarchical 
chiefdoms and states, to being able to devise incentive 
schemes to prevent climate change, to being able to engineer 
artificial systems that autonomously adapt their social 
organization to changing conditions. All of these efforts lie at 
the interface with a number of other disciplines that are 
concerned with understanding human social organization, 
including anthropology, archeology, artificial intelligence, 
economics, evolutionary biology, primatology, political 
science, and psychology. 

In this paper, I review the different approaches that have 
been used to model the cultural evolution of human societies. 
I argue for the merits of an institutional approach. Following 
Hurwicz (1996), I define institutions as political game forms 
that generate the rules of a group’s economic interactions. 
This is in contrast to other work that has tended to define 
institutions either as equilibrium behavior within a society, or 
as the rules of the economic interactions themselves. Instead, I 
show that by viewing institutions as political game forms that 


generate these rules, we can develop dynamic models of how 
societies change over time, allowing us to better address the 
goals of Artificial Life researchers. 

Two big questions about human societies 

When we look at human societies, two big features stand out 
as being is in particular need of explanation. The first is the 
high level of cooperation and coordination between unrelated 
individuals. Compared to other primates, humans are unique 
in depending upon exchange with other individuals for nearly 
all of their vital resources. In economics, this high degree of 
interdependency is known as catallaxy, and contrasts heavily 
with the autarky and self-reliance of other primates. 
Strikingly, the degree of interdependence has increased over 
time from the first hunter-gatherers through to modern day 
states (North, 1990). For hundreds of thousands of years, 
humans lived as hunter-gatherers, obtaining resources by 
hunting large animals and gathering plant materials (Marlowe, 
2005). Studies of extant hunter-gatherer groups imply that 
ancient hunter-gatherer groups practiced extensive food 
sharing between camp members (Boehm, 1999), and that 
there was a marked division of labor between males who 
hunted large animals, providing protein, and females who 
gathered plants, providing carbohydrates (Marlowe, 2007). 
With the Neolithic origin of agriculture that began circa 
10,000 years ago, division of labor further increased, with 
some individuals specializing entirely in tasks unrelated to 
food production, such as producing crafts (Oka & Kusimba, 
2008). Where we see such high levels of specialization 
elsewhere in the biological world, it is only in cases where 
there is a very high genetic relatedness between group 
members, as exemplified by eusocial insect colonies. In such 
cases, the division of labor is coordinated by means of a 
common genetic program carried by each individual. But in 
human societies, division of labor and exchange occurs 
between unrelated individuals that may never meet again, 
what Paul Seabright (2010) calls “A company of strangers”. 
This creates all kinds of opportunities for one party to cheat 
on an exchange (North, 1990), while the fact that interactions 
in modern societies are between unrelated individuals who 
may never meet again is problematic for traditional 
explanations for cooperation based upon kinship and 
reciprocity. 

The second key feature of human societies is their 
transition between egalitarian and hierarchical modes of social 
organization (Currie et al., 2010). Both anthropological and 
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archeological evidence imply that the first human social 
groups were egalitarian hunter-gatherers. Anthropological 
studies of modern hunter-gatherer groups show that decisions 
are invariably reached by a group consensus being formed, 
with each individual being allowed to voice its opinion in a 
group-wide discussion (Boehm, 1999). While such groups do 
have leaders, the role of leaders is not to coerce others or 
monopolize the discussion, but rather to facilitate turn-taking 
and help the group reach a consensus. Archeological evidence 
of burial sites similarly reveals little status differentiation 
when individuals were buried (Price, 1995). 

By contrast, the transition to agriculture was accompanied 
by a shift to hierarchical social organization, with a small 
number of individuals exhibiting high status. Evidence from 
burial sites shows that leaders started to be buried with 
valuable grave goods such as obsidian, and were not buried 
alongside other group members as had occurred previously 
(Price, 1995). Hierarchy was manifested both in resource 
inequality, and in inequality in decision-making, with leaders 
at the top of the hierarchy coercing the rest of the group to 
follow their decisions. The archeological evidence points to 
the first hierarchical societies being chiefdoms, with a single 
level of hierarchy, i.e. a chief presiding over commoners. The 
origin of states around 4000 years ago is defined in terms of a 
shift to multiple levels of hierarchy, with rulers creating 
specialized administrative positions between themselves and 
the commoners (Spencer, 2010). This represents a new form 
of division of labor and specialization, where some 
individuals specialize in administering the group. 

What we see in human evolution, then, is a gradual increase 
both in hierarchical organization, and in the degree of division 
of labor and specialization. These co-occur with an increase in 
group size. Hunter-gatherer bands would have numbered no 
more than the hundreds. Cemetery evidence shows that the 
origin of agriculture brought about a massive increase in 
fertility (Bocquet-Appel, 2011), while further evidence 
suggests that the population density of early agriculturalists 
may have been up to 40 times larger than that of hunter- 
gatherers (Hassan & Sengel, 1973). This is supported by 
evidence that the first cities arose during this period. Finally, 
in modern states economic interactions occur between 
millions of individuals. To understand societies, what 
Artificial Life needs is a dynamic model of how cooperation, 
hierarchy, and group size co-evolve. In the next section, I 
introduce the critical role that institutions play in this. 


Institutions 

What do economic interactions within groups look like? In 
modern groups, individuals take part in a range of 
interactions, from bilateral exchange through to the 
production and maintenance of public goods upon which the 
whole group depends, such as clean air. These have 
traditionally been modeled as pairwise reciprocity, and N- 
player public goods games, respectively. However, these 
models abstract away from the fact that human economic 
interactions are universally governed by rules. These rules 
change what the optimal economic behavior for self-interested 
individuals is. The rules are created by institutions, and are 
referred to here as institutional rules. Institutions, in turn, are 
the processes that create the economic rules. 


Institutions and institutional rules are not an invention of 
modern society; they exist even in hunter-gatherer groups 
(Kaplan et al., 2005). For example, extant hunter-gatherer 
groups have rules specifying who may take part in hunting an 
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Figure 1: Human social evolution (adapted from 
Powers, van Schaik & Lehmann, 2016). 

animal, who gets to keep which part of the kill, how the food 
will be shared back at the camp, et cetera (Hill, 2009). 
Similarly, the origin of agriculture necessitated the creation of 
rules of property rights, to prevent one individual from simply 
having their crops taken by another (Bowles & Choi, 2013). 
Agriculture would also have required rules to regulate the 
construction and usage of new collective goods such as 
irrigation systems; such rules are seen in extant small-scale 
farming villages (Ostrom, 1990). Finally, trade in the 
medieval period required rules to allow a trader to ascertain 
the reputation of new trade partners, as in the Law Merchant 
system in Europe (Greif, 2006). With regard to the present, it 
has been argued that institutions are the main determinant of 
whether whole nations succeed or fail (North, 1990; 
Acemoglu & Robinson, 2011). 

The processes by which institutional rules are created has 
also changed over the course of human social evolution. 
Although institutional rules typically change slowly, over 
many generations, they are nevertheless not the result of 
random drift-like processes, but instead are actively shaped by 
group members pursuing their own interests. Specifically, we 
should expect each group member to try to create institutional 
rules that will benefit itself and its kin. In extant hunter- 
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gatherer groups, institutional rules are routinely discussed by 
all group members around the camp fire (Boehm, 1999). By 
contrast, with the rise of agriculture leaders started to 
dominate the creation of institutional rules, creating rules that 
benefitted themselves (e.g. by reinforcing inequality) at the 
expense of the rest of the group. 

The story of human social evolution, then, is a story about 
how institutions and institutional rules have changed over 
time (Powers, van Schaik & Lehmann, 2016). How have 
institutional rules been created that allow for successful trade 
between individuals who may never meet again (North, 
1990)? Why have some groups been able to create 
institutional rules that move their economic game form away 
from the Tragedy of the Commons when sharing common 
resources such as irrigation systems or fisheries (Ostrom, 
1990)? And why did the processes that create a group’s 
institutional rules change from egalitarian in hunter-gatherers, 
to extremely hierarchical in the first states? 

Where institutional rules are included in models, they 
usually take the form of rewards for cooperative behavior, or 
punishment for uncooperative behavior. But this is often done 
by assuming that each individual alone makes a unilateral 
decision about whether to punish or reward another group 
member (so-called “peer-punishment” and “peer-rewarding”), 
and pays a cost on its own for doing so. However, in reality 
rewards and punishment follow agreed rules and are done in a 
coordinated by the whole group, so that no one individual 
bears the cost alone (Baumard, 2010; Guala 2012; Powers & 
Lehmann 2013). 

The important question is then, how are the institutional 
rules formed? Very few models have actually looked at this 
question. The few models that have looked at coordinated 
rewards and punishment have often assumed that the reward 
or punishment scheme is determined exogenously by 
processes outside of the model. While this approach is useful 
for looking at the effects of various institutional rules, it 
cannot address how or why institutional rules change over 
time. What we need is a model of the evolution of institutional 
rules, a dynamic model that accounts for how institutional 
rules adapt to changing ecological conditions (Ostrom, 2005). 

A framework for modeling the creation of 
institutional rules 

Hurwicz (1996) provides a general model for this. Hurwicz 
defines an institution as a political game form, which sets the 
rules for a subsequent economic game form. In game theory, a 
game form consists of the set of allowed strategies plus the 
mapping between strategies and outcomes. A game then 
consists of the game form plus the individual preferences over 
outcomes, i.e. the player’s utility functions. Separating the 
game form from the game is useful because the game form 
represents the parts that can be changed by institutional rules, 
i.e. the parts that are malleable to human intervention 
(Hurwicz, 1996). In the political game form, the individual 
strategies consist of messages, and the outcomes consist of 
rules. The material payoffs that individuals earn are then 
determined by playing an economic game form, such as a 
public goods game, that is governed by these rules. For 
example, the political game form may consist of individuals 
agreeing that each group member should contribute a certain 
amount to the public good, and that any individual that 


contributes less than this will be punished by an agreed 
amount. Material payoffs are then assigned by playing the 
public goods game with these rules (Figure 2). 

In the presence of an institution then, individuals engage in 
two stages of social interactions, where the first (political) sets 
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Figure 2: An institution is represented by a political game 
form, which determines the rules for subsequent economic 
interactions. 


the rules for the second (economic). Different sets of 
institutional rules generated in the political game form will 
change the way that self-interested individuals will behave in 
the economic game form. In other words, the results of the 
political game form will determine whether cooperation is 
favored or not. 

What might the political game form look like? In hunter- 
gatherer groups, it is typically of an egalitarian nature where 
the preferences of all group members are taken account of 
(Boehm, 1999). From a modeling point of view, this could be 
operationalized by forming institutional rules by taking some 
aggregate of the preferences of each group member for the 
rules. By contrast, with the origin of agriculture, and 
subsequently the first states, political game forms became 
much less egalitarian (Price, 1995; Earle, 1997). Through 
unequal access to resources, leaders became able to dominate 
the political game form and create rules that benefitted 
themselves at the expense of others. An example of this is 
institutional rules that determine how the surpluses resulting 
from agriculture are distributed within groups. In hunter- 
gatherers, institutional rules meant that food was shared 
relatively equally within groups (Boehm, 1999). With the 
transition to agriculture, however, despotic leaders created 
rules of distribution in which most resources went to 
themselves and their kin (Powers & Lehmann, 2014). In these 
cases, models of the political game form should give weight to 
the amount of resource that a group member has, in contrast to 
the egalitarian political game form appropriate for modeling 
hunter-gatherer groups. 
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This raises an important question: how do political game 
forms change over time? The political game form itself has 
rules, determining how the preferences of individuals result in 
rules for economic interactions. These rules can themselves 
change. In the general model of Hurwicz, the rules of the 
political game form are set by a preceding game form, which 
can be thought of as a “constitutional game form” (Ostrom, 
2005). The constitutional game form might model, for 
example, a transition between egalitarian and hierarchical 
interactions within groups. Of course, the rules for the 
constitutional game form themselves have to come from 
somewhere, and they may be set by another preceding game 
form. However, there will not be an infinite regress of game 
forms, because eventually the rules will be given by 
unchangeable aspects of the environment, such as the total 
amount of resources available to individuals, and the laws of 
physics (Hurwicz, 1996; Ostrom, 2005). 

One criticism of the Hurwicz model might be that in reality 
institutions change very slowly, and that institutional 
evolution is highly path dependent. The model presented here 
can take account of this, however. In particular, the political 
game form does not have to be played on the same timescale 
as the economic game form. For example, the economic game 
form may be played many times over the course of a 
generation, while the political game form may only be played 
once every several generations. Further, the political game 
form takes account of path dependence because it is 
constrained by rules set by the constitutional game form, 
which will typically be played even less frequently. In this 
way the model combines intentional change, where self- 
interested actors actively try to create rules to benefit 
themselves, with historical contingencies. The balance 
between the effect of historical contingencies and the effect of 
intentional action is an empirical question that can only be 
determined by examining the institutions in question. 

Comparison with other approaches to modeling 
institutions 

The two main approaches in the literature have been to view 
institutions either as the rules of the economic interactions 
themselves (e.g. North, 1990; Ostrom, 1990), or to view 
institutions as equilibrium patterns of behavior within groups 
(e.g. Richerson & Henrich, 2012). The problem with both of 
these approaches is that they struggle to explain institutional 
change. Viewing institutions as rules recognizes that they can 
be produced by intentional action. In other words, it 
recognizes that institutions are the means by which humans 
create their economic interactions (North, 1990). However, 
we also need a model for the processes that generate the rules. 
Following Hurwicz (1996), it is argued here that the essence 
of an institution is a political game form that generates rules, 
as well as the rules themselves. 

In cultural evolution models, it is common to view 
institutions as equilibria (see e.g. Richerson & Henrich, 2012). 
The idea here is that different social groups reach different 
stable equilibria (for example as modeled by Boyd & 
Richerson, 1990), i.e. settle on different institutions. This is 
compatible with the model presented here to the extent that 
different institutional rules, i.e. different outcomes of the 
political game form, will lead to different equilibria in the 
economic game form. However, the two approaches make 


very different predictions about the processes by which 
groups move between equilibria. In the “institutions as 
equilibria” model, institutional change is a result of random 
drift-like processes followed by competition between groups. 
This is commonly referred to as cultural group selection 
(Richerson & Boyd, 2005), and is inherently a slow process 
because variation is only selected at the group level. 

Moreover, the change of institutions by cultural group 
selection is expected to be discontinuous, with long periods of 
stasis interspersed by sudden and large change when between- 
group competition events occur and groups suddenly jump to 
a new and previously unreachable equilibrium. Between- 
group competition must typically take an extreme form in 
order to shift another group to a new equilibrium, for 
example, the extinction of whole groups and the 
recolonization of their sites by members of other groups, as 
modeled by Boyd & Richerson (1990), for example. However, 
the sudden and complete change of behavioral equilibria 
predicted by these models is at odds with empirical 
observations of institutions. Rather, most institutional change 
is gradual (North, 1990; Ostrom, 1990). For example, the 
reliable enforcement of exchange contracts by state courts in 
Europe followed from the informal enforcement mechanism 
of the Law Merchant courts for traders in medieval Europe 
(North, 1990). Similarly, the institutional rules that provided 
for cooperative use of the huerta irrigation systems in 
southern Spain described by Ostrom (1990) developed 
gradually by trial-and-error tinkering of rules over a 1000 year 
period. Indeed, the empirical work of Ostrom suggests that 
sudden imposition of different institutional rules by those 
outside of the group is likely to lead to a reduction in 
cooperation. This is because what works in well in the 
particular environmental conditions of one group will 
typically not work well in another environment, even if both 
groups face a similar problem such as managing an irrigation 
system (Ostrom, 1990; Baumard, 2010). It is also because 
social groups operate with norms and other informal 
constraints that cannot simply be changed by fiat (North, 
1990). 

By contrast, the “institutions as political game forms” 
model presented here allows institutional rules to change as a 
result of the intentional action of agents over shorter 
timescales. This fits well with the cognitive skills of humans, 
including language and shared intentionality (Tomasello & 
Carpenter, 2007). It accounts for the fact that self-interested 
individuals should be expected to try and craft institutional 
rules that benefit themselves in economic interactions. While 
cultural group selection posits that between-group interactions 
are the driving force in institutional change, the model here 
assumes that institutional rules are affected by the within 
group processes of bargaining and negotiation between self- 
interested individuals (the political game form). Institutional 
rules are predicted to typically change gradually, and to be 
increments of the preceding rules. The cause of change is that 
one or more individuals estimate that the cost to themselves of 
changing the rules is more than offset by the subsequent gains 
that they will receive under a new economic game form. 
When institutional rules change, the direction of that change 
depends upon the preferences of individual group members, 
and the corresponding bargaining strength of the individuals 
in the political game form (North, 1990; Reiter, 1996). 
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The fundamental difference between North and Ostrom is 
the type of cooperative interaction that they focus on. North 
focuses on the dyadic exchange of private resources, i.e. trade. 
As he stresses, the reason that cooperation is not a problem in 
the neoclassical model of exchange is that both parties to the 
exchange are assumed to have perfect information, and the 
exchange is assumed to happen simultaneously and with 
perfect enforcement of contracts. In reality these conditions 
are never perfectly met. Asymmetries in information mean 
that one party may know more about the goods to be 
exchanged than the other, or may be able to exploit the lack of 
perfect enforcement, leading to a Prisoner’s Dilemma 
situation (North, 1990). North is interested in how 
institutional rules can avoid this from happening, and hence 
how the neoclassical gains from dyadic trade can be realized. 
He is quick to point out, however, that there are just as many 
institutional rules which do not promote cooperation as there 
are rules that do. He contrasts the effects of institutional rules 
in Third World economies with those of the West in terms of 
their different effects on economic growth. Ostrom, on the 
other hand, is concerned with the exploitation of common 
resources such as irrigation systems and fisheries. Her 
empirical work shows how the right kind of institutional rules 
can avert the Tragedy of the Commons (Ostrom, 1990). It also 
shows that institutional rules can fail to avert the Tragedy. 
This implies that if we are going to make policy interventions 
to try and increase cooperation, then we need a dynamic 
model of the evolution of institutional rules in order to 
understand what kinds of changes might or might not promote 
cooperation. 

Other work has looked at the effect of various institutional 
rules on the evolution of human cooperation (e.g. Sasaki et al. 
2012; Chen et al., 2014). A key question this work has 
addressed is whether rewards or punishment are more 
effective at promoting cooperation in public goods games. 
While this work has examined the effect of varying the 
magnitude of rewards or punishment, it has treated the amount 
of reward or punishment as an exogenous parameter of the 
model. Consequently, these models have not addressed why 
groups would actually settle on different reward and 
punishment schemes. Essentially, the models have looked at 
the effects of varying the outcome of the political game form, 
but have not actually modeled the political game form itself 
and so have not addressed how the institutional rules actually 
evolve. 

The next section provides an example of how the general 
Hurwicz model can be instantiated as a dynamic model of the 
evolution of institutional rules. 

A simulation model of the evolution of 
institutional rewards and punishment 

The model presented here is largely based upon that presented 
in Powers & Lehmann (2013), but modified to allow groups to 
reward cooperators as well as punish defectors. Individuals 
carry three cultural traits that are passed from parent to 
offspring subject to a mutation rate, \i . The first trait 
determines whether individuals cooperate and produce B 
units of public good at a cost of C to themselves, or whether 
they defect and produce no public good, and hence pay no 


cost. Mutation on this trait involves changing to the other 
type. The second trait is a preference, h, (range 0 to 1, 
inclusive) for the proportion of the group’s public good that 
should be used for helping, i.e. distributed between all group 
members to increase their payoff. The remaining proportion of 
the public good is then used to pay for institutional rewards 
and punishment. How this is divided up between reward and 
punishment is determined by the third trait that individuals 
carry. Specifically, individuals have a preference for what 
proportion, r, (range 0 to 1 inclusive) of the remaining public 
good should be used to reward cooperators as opposed to 
punish defectors. Mutation on these preference traits is done 
by adding a small random number drawn from a normal 
distribution with mean 0. 

Unlike Powers & Lehmann (2013), which modeled 
structured populations, here individuals interact in randomly 
drawn groups of size n (without replacement from the global 
population). Groups are reformed every generation. Within 
groups, individuals play a political game form followed by an 
economic game form. The political game form determines H, 
the proportion of a group’s public good that is used for 
helping. It also determines R, the proportion of the remaining 
public good that is used to reward cooperators as opposed to 
punish defectors. The model assumes an egalitarian political 
game form in which each group member’s preference is 
weighted equally. H and R are then set by taking the mean 
of each group member’s preference (without regard to 
whether the individual is a cooperator or a defector). This is 
then followed by the economic game form, which is modeled 
as a linear public goods game. Cooperators contribute to the 
public good, and may be rewarded for doing so, depending 
upon the outcome of the political game form. Defectors do not 
contribute and may be punished for this. The fitness of 
cooperators (w_c) and defectors ( w_d ) is then given by: 
w_c = ( HBn_c)/n - C + (1 - H)RBE, 
w_d = ( HBn_c)/n - [(1 - //)(1 - R)BEn_c]/n_d. 

E is the efficiency of the institution, i.e. the rate at which 
public good is converted into rewards or punishment, n_c is 
the number of cooperators in the group, and n_d the number 
of defectors in the group. The term (1 — H)RBE represents 
the rewards to cooperators given by the institutional rules 
decided by the group members in the preceding political game 
form. Similarly, the term [(1 — //)(1 — R)BEn_c ]/ 
n_d represents the punishment given to defectors according to 
the agreed institutional rules. Crucially, because H and R 
depend on individual traits h and r, these institutional rules 
themselves evolve by individual selection. 

After the public goods game has taken place and fitness 
determined, all individuals in the global population compete 
to form a new population of size N by fitness proportionate 
selection, i.e. individuals leave descendent offspring in 
proportion to their fitness. Generations are non-overlapping. 

Results 

A full analysis of the model will be presented elsewhere. The 
purpose here is to illustrate how institutions as political game 
forms can be modeled in simulation. 

The results (Ligure 3) demonstrate that cooperation- 
promoting institutions can result from a political game form, 
even when each individual taking part in the political game 
form is following its own self-interest. In the absence of 
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rewards or punishment, cooperation would not be stable, and 
would have a long-run frequency close to 0, using the 
parameters in Figure 3 (since it is a standard result that in such 
cases cooperation would only be stable when R/n > C, such 
that the actor’s share of the benefit it produces is greater than 
the cost). However, we see that while individuals evolve to 
invest most of their public good in the benefit of helping, they 
do invest enough into institutional reward and punishment to 
maintain cooperation. Occasionally the average h -trait 
becomes very close to 1, meaning that little is invested in 
rewards or punishment. In these cases cooperation collapses. 
However, cooperation is quickly recovered once the /i-traits 
start to become slightly smaller again, creating sufficient 
rewards such that cooperation again pays more than defection. 
Previous work (Powers & Lehmann, 2013) suggests that these 
fluctuations may not happen in structured populations. 

One important finding is that individuals evolve to create 
institutions that mainly use rewards rather than punishment to 
support cooperation. Chen et al. (2014) argued that the 
optimal strategy should be for groups to use rewards when 
cooperation is rare, but then switch to punishment once 
cooperation is common. Such a policy minimizes the 
expenditure necessary to favor cooperation. However, the 
results presented here suggest that while this policy may be 
the optimum, this does not mean that the evolution of 
institutional rules will necessarily settle upon it. The model 
here suggests that when individual preferences for rewards or 
punishment are evolving, they may tend to favour rewards 
even when cooperation is common. This highlights the 
importance of explicitly modeling the process by which 
institutional rules are generated within groups. 


Discussion 

Institutions can be defined as political game forms that 
generate the rules, and hence incentives, for economic 
interactions (Hurwicz, 1996). Taking this view allows us to 
produce dynamic models of institutional evolution. This 
allows us to explore why some groups have historically 
managed to create institutional rules that foster cooperation, 
and why others have failed (North, 1990; Acemoglu & 
Robinson, 2011). Applications to this include understanding 
the rise of hierarchy and states, and addressing pressing public 
goods problems such as climate change. 

Cultural group selection models have traditionally viewed 
institutions as equilibria. These models suggest that 
institutional rules change by a slow process of random drift 
and between-group competition. However, individuals should 
be expected to try to craft institutional rules that benefit 
themselves. This means that institutional rules can also change 
as a result of within-group processes, often on much faster 
timescales. 

Future work needs to model political game forms in more 
detail. There is a need for more realistic models of the 
bargaining and negotiation processes that go on within groups 
to generate institutional rules. How we can best model the 
bargaining process between individuals with different 
preferences for institutional rules? The processes by which 
political game forms themselves change also need to be 
modeled. When are political game forms likely to move 
between egalitarianism and despotism, as happened, for 


example, with the transition from a hunter-gatherer to 
agricultural lifestyle 10,000 years ago? 

In summary, a framework for modeling institutional 
evolution has been presented here. An application of the 
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Figure 3: Co-evolution of institutional rules for rewards and 
punishment alongside individual strategies in the economic 
game form. Parameters: n = 15, N = 750, B = 0.9, C = 
0.1, E = 0.9. 


framework was illustrated using a simple model of the co¬ 
evolution of individual social behaviors, with individual 
preferences for whether groups should reward cooperators, or 
punish defectors. The political game form was modeled as an 
egalitarian process in which the preferences of all group 
members were aggregated. It is intended that this framework 
will allow Artificial Life researchers to address how groups 
can create conditions that support cooperation. In the final 
section of this paper, I turn to discuss how the institutional 
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modeling approach might help Artificial Life to address 
pressing social issues in modern societies. 

What can the institutional approach offer to 
our understanding of societal challenges? 

The problem of cooperation in modern societies manifests 
itself in two forms. The first is in exchange of resources 
between agents, i.e. trade. Trade may be between individuals 
at a village market, between firms within a nation, or between 
nations. The second form of cooperation is in the provision 
and usage of collective goods, ranging from the management 
of a local inshore fishery, through to a global reduction in 
carbon emissions to prevent climate change. 

In all of these cases, what determines whether or not a 
society achieves cooperation is whether or not its institutional 
rules provide the right incentives to the agents in that society. 
Do the institutional rules move the economic game form away 
from a single-shot Prisoner’s Dilemma? The agents could be, 
for example, single individuals, firms, or governments. 

As Ostrom (1990) notes, policy prescriptions by 
economists and other social scientists have traditionally 
involved externally imposing a solution to a cooperation 
problem on a society. For trading, this might involve 
suggesting that a society copy the market rules of a more 
successful society. For collective goods, suggested policies 
might include either dividing the good into private shares, or 
assigning a state body to monitor and enforce rewards and 
punishments (Ostrom, 1990). But as Ostrom stresses, these 
imposed mechanisms of institutional change have repeatedly 
failed. Essentially, this is because what works well in one 
local environment need not necessarily work well in another. 
This is both because local environments will tend to differ in 
ways that affect the economic game form, and because 
different societies have different local norms and customs. 
Transplanting institutional rules into a society in which they 
are not compatible with the norms and beliefs held by the 
agents within that society is unlikely to work. Furthermore, 
norms and beliefs typically change very slowly, hence why 
economics tends to explain changes in behavior in terms of 
changes in relative prices rather than by changes in tastes 
(North, 1990). 

This suggests that to make successful policy prescriptions 
we need a bottom-up understanding of how institutional rules 
change within societies. Traditional models in economics 
have focused on equilibrium conditions. But such models, 
along with cultural group selection models, are ill suited to 
capture the dynamics of institutional evolution, because 
institutions typically change through many small and gradual 
changes. And while the Hurwicz framework and similar 
approaches (e.g. Reiter, 1996) have been proposed in 
economics, they have not been instantiated in a fully dynamic 
form that fits particular empirical scenarios. 

This is where Artificial Life, and the related field of agent- 
based economics, comes in. At its very core, Artificial Life is 
concerned with producing the bottom-up generation of 
behavior. This is exactly what is needed to understand how 
agent behavior and institutional rules co-evolve. To date, a 
convincing theory of institutional change has been lacking. A 


convincing model of institutional change needs to both allow 
institutional rules to change as a result of individual agent 
behavior, and to allow for the fact that individual agents are 
not perfectly rational and have incomplete information about 
their environment. These are both traditional strengths of 
Artificial Life. 

Artificial Life researchers are also used to dealing with 
complex systems in which small perturbations can sometimes 
cause large and unexpected shocks. This is quite likely to 
occur with institutional evolution, where small changes in the 
political game form may lead to large changes in the 
economic game form. Again, the toolkit of bottom-up 
modeling is well equipped to highlight this. 

By using Artificial Life simulation techniques, we can 
begin to get a handle on the effect that changing institutional 
rules is likely to have on economic game forms, and on how 
these changes in the economic game form feed back into 
changed individual preferences in the political game form. We 
can also start to appreciate the effect of different political and 
constitutional game forms on this process. This has previously 
all lied outside of the scope of static equilibrium models, 
which has limited the ability of analysts to foresee the 
implications of policy changes. 
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Abstract 

One of the key components of the suspension of disbelief in 
real-time 3D simulations is the apparent authenticity of ac¬ 
tions and gestures played by the individuals of the virtual 
population. This paper addresses this aspect of simulation, 
by investigating ways to improve the behavioral realism of 
virtual humanoid characters in groups and small multitudes. 
We look at the framework of ALife, identifying and analyz¬ 
ing existing bio-mimicking techniques that can be used in this 
context and contribute towards the improvement of the plau¬ 
sibility from the generated simulations. By looking at the 
literature, we identify some of the key elements from ALife 
that are being progressively incorporated in the simulations of 
groups and crowds. Then, we discuss a generative model for 
spontaneity and heterogeneity where bio-inspired agents are 
individualized with DNA-like strings and appear organized 
hierarchically exchanging token units of energy, mass, and 
resources. The result is a generative population of agents that 
self-organize and interact autonomously, exhibiting interest¬ 
ing social dynamics based on biological tenets and an econ¬ 
omy of resources. We analyze this simulation quantitatively 
with the purpose of studying the impact of each of the previ¬ 
ously identified techniques. 



Figure 1: Global view of the environment of the experimen¬ 
tal setting created for this paper. 


Introduction 

Animation of crowds in the historical site of Pompeii (Maim 
et al., 2007), or visitors in theme parks (Shao and Ter- 
zopoulus, 2006) are good examples of a developing area of 
research that looks at modeling virtual spaces inhabited by 
communities of humanoid avatars that self-organize and in¬ 
teract autonomously. Commercial video games, such as The 
Sims (Electronic Arts, 2015), Assassins Creed 4 (UbiSoft, 
2014) and Grand Theft Auto (Rockstar Games, 2015) also 
share similar goals with great success in terms of main¬ 
stream appeal. Traditionally, the algorithms for modeling 
crowds attempt to simulate the realistic behaviors of the 
crowd at the macro-level, including the features of its spatial 
flow (Helbing, 1992; Hughes, 2003). Recently, more atten¬ 
tion has been put into the micro-level, centered on individual 
behavior within a multitude (Weizi and Allbeck, 2011; Park 
et al., 2012). The challenge is to be able to create complex 
scenes in real-time, with generative populations of virtual 
humans, interacting autonomously, where behaviors resem¬ 
ble the variety-rich feel of the real world. 

Human biology, psychology, social organizations, and re¬ 
lationships form complex networks within which behaviors 
occur. In this complex matrix, the cognitive and biological 
systems act as elementary forces in generating and shaping 
motivations. Physiological and psychological processes are 
dynamic, and two individuals sharing similar initial condi¬ 
tions may act in different ways when facing identical stimuli 
according to their past experiences and environmental con¬ 
text. In that sense, groups of humans can be described as 
complex adaptive systems since they act as a form of decen¬ 
tralized, distributed processing, where their internal states 
and the environment interoperate in feedback loops. Influ¬ 
enced by dynamic variations in individual motivations one to 
one interactions occurring at the local scale lead to changes 
in the observed patterns emerging higher at the group and 
population levels. 

One area of knowledge sharing the interest in related 
themes of complex phenomena is Artificial Life (ALife), a 
discipline characterized by the study of complex processes 
observed in organisms and communities. We are interested 
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in investigating how the framework of ALife can benefit 
the field of crowd simulation, namely for its emphasis on 
the phenomenon of Emergence and Self-organization. Bio¬ 
inspired systems, known as Computational Ecosystems, are 
part of this framework. These are multi-agent systems where 
individuals appear organized in a hierarchical way (in the 
form of a food-chain), and traditionally agents have their 
motivations based on their self-sustenance and the perpetua¬ 
tion of their genetic patrimony. The different internal states 
of each of the agents during their regular activity (search 
for energy, fight, eat, etc.), generates individual differenti¬ 
ation at the local scale of the community and permanently 
changes global patterns and flows. 

We have built on this type of systems to develop an agency 
model for generative populations of humanoid characters 
with social dynamics based on biological tenets and an econ¬ 
omy of resources. We further developed a simulation using 
this model with the objective of analyzing its overall behav¬ 
ior quantitatively. The purpose of this study is to understand 
the impact and benefits of techniques from the framework of 
ALife implemented in the context of behavioral simulation 
of humans in groups and multitudes. 

We have organized the paper as follows. First, we dis¬ 
cuss the objectives and contextualize this work with related 
work in crowds and group simulation. Then, in the section 
Methodology, we provide details of the agency model, and 
its implementation in a population of autonomous trading 
agents (Figs. 1 and 2). In the section Results, we discuss the 
outcome of this experiment, bringing up the advantages and 
disadvantages of this model to conclude suggesting future 
possibilities for research and development drawing on this 
approach. 

State of the art 

Attention to metabolic functions has been, traditionally, at 
the core of ALife practice with multi-agent systems (Yaeger, 
1994; Taylor and Hallam, 1997; Dorin, 2009). Traditionally, 
agents require a permanent input of token units of some sort 
from an external source. Usually, these tokens are identified 
as ‘energy.’ This energy is then converted into useful activ¬ 
ity when agents spend it performing their regular activities. 
This simple mechanism provides an intrinsic motivation to 
act upon in the world. We see this motivational strategy 
progressively appearing in a growing number of simulation 
of humans in an empirical observation of Abraham Maslow 
predicates of metabolic functions preceding other human ac¬ 
tivity (Maslow, 1943). Sevin and Thalmman describe an 
agent that needs to satisfy its hungriness and thirstiness as 
well as its tiredness consequently needing to eat, drink, rest 
and sleep (Sevin and Thalmann, 2004). However, this was 
implemented as a single agent model. 

Later, these functions began to be relatively more fre¬ 
quent within groups and multitudes of virtual humans. For 
instance, we can see these reappearing in Navarro et al 


(Navarro et al., 2015), or in Silverman et al. (Silverman 
et al., 2005) and Cassenti’s (Cassenti, 2009) work. In AStar, 
Navarro and colleagues equally implement hungriness and 
thirstiness as bottom level motivational factors for agency 
(Navarro et al., 2015). In PMFServ, Silverman’s agents act 
motivated by their energetic requirements, their stress lev¬ 
els, and they also suffer fatigue requiring some resting time 
to sleep and recover (Silverman et al., 2005). The correla¬ 
tion between stress and metabolic functions causes a corre¬ 
sponding degree of fallibility, in a similar way as their hu¬ 
man counterparts. 

One of the interesting aspects of metabolic oriented 
agents such as the above is that they become individually dif¬ 
ferentiated as their internal states progress differently from 
each other. We can see a similar approach in Trescak and 
colleagues (Trescak et al., 2014) where agents similarly have 
basic metabolic needs with changing levels of hungriness, 
thirstiness, fatigue and comfort. Since objects have anno¬ 
tated functions, such as ‘fish-cook’, or ‘fish-catch’, agents 
interact with them differently according to their internal 
states. Moreover, these authors went a step further in their 
appropriation from ALife’s framework. In their simulation 
of the Babylonian city of Uruk, they represented individuals 
using a DNA like string, which encodes the visual represen¬ 
tation and the social roles played by the agent in the world. 
Antunes and Leymarie (Antunes and Leymarie, 2013) also 
explore the DNA mechanism to define hierarchical classes 
of individuals that establish the social relationships. Another 
technique these later authors have borrowed from ALife is 
the one of reproduction. In their simulation, of gregarious 
humanoids, they have included states of birth and death to 
create dynamic changes in the population density. The inter¬ 
nal states of the agents are also used to trigger differentiated 
gestural animations depending on the type and outcome of 
the interactions. 

Given the growing of attention that this framework is 
gaining in the field of group and crowd simulation, it seems 
opportune to address now this topic of research, and identify 
and study the impact of its techniques. 

Methodology 

We defined our experimental setting as a community of au¬ 
tonomous humanoid characters that was built based on a 
model of agency that draws upon ALife’s predicates. The 
objective is to be able to measure and study its properties 
in terms of the generated patterns of behavior namely, their 
diversity and spontaneity. 

The model of agency 

We got inspired by earlier work on societies of agents from 
the domain of ALife (Holland, 1996; Yaeger, 1994; Mc¬ 
Cormack, 2001). The structure of our population was sim¬ 
ilarly arranged in hierarchical layers organized in a way 
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Figure 2: Close up of two individuals interacting. 

to promote interaction and resources exchange. Perma¬ 
nent changes of patterns and flow characterize these type 
of multi-agent communities since to satisfy their goals indi¬ 
viduals need to move to find interaction partners. We found 
useful to include a similar and relatively simple motivational 
layer, where agents need to exchange token units of energy 
and resources, motivated by their survival and the perpetua¬ 
tion of their genetic patrimony. 

DNA based individuals 

Individuals in the virtual population are identified by a 
DNA-like code, which functions as a blueprint defining its 
particular features. The sixteen binary digits of the DNA 
structure encode various phenotypic aspects such as a) the 
type of resources produced and those required to perform 
metabolic functions, b) initial parametrization of the psy¬ 
chological component, c ) the definition of procreation com¬ 
patibility, as well as d) the biological component of the per¬ 
sonality. These features establish a class distinction between 
individuals which allows us to define functional hierarchies 
organized around the trade of resources and reproduction. 

Eq. 1 describes the blueprint, where each of the characters 
from a to q in the DNA-set stands for one, or part of one 
parameter. 

DNA = {a, b,... ,o, q}, a,b,... ,o,q G {0,1} (1) 

Reproduction 

Agents emulate a life cycle, including birth, death (by the 
lack of energy), and reproduction. When individuals mul¬ 
tiply, their progeny inherits half of each of the parent’s ge¬ 
netic blueprints using a crossover operation with associated 
mutation. The mutation operator flips each allele-bit prob¬ 
abilistically (10%). When individuals are bom, they appear 


in the animation from a predefined arbitrary building. Sim¬ 
ilarly, when they die, they move to another one before they 
get removed from the system. This technique allows us to 
have continuous fluctuations of population density, as well 
as a discontinuous diversity of functions and roles. 

Metabolism 

Resources occupy a central role in this scheme. They have 
a dual function: firstly, they are required to generate energy; 
secondly, the agent recycles them to produce other types of 
resources. One hypothetical example, the metabolic func¬ 
tion of individual i uses one unit of resource type a and one 
unit of resource type b to generate one unit of resource type 
c and 100 units of energy. 

Each possible action performed by the agent has an asso¬ 
ciated cost expressed as token units of energy. For instance, 
the action ‘move’ will require one token of energy per me¬ 
ter whereas ‘run’ will use two instead. As a consequence, 
agents need a regular input of energy. To generate this, each 
of the agents needs to gather resources and trade the ones he 
owns by those he needs. This fact gives them an intrinsic 
motivation to act in the world. 

Algorithm 1 describes the metabolic function, where e t 
is the value of energy at time t , {a,b,c} are the resources 
owned by the agent and required for its metabolism, and d 
stands for the resources produced, and k is a constant value. 


Algorithm 1 the metabolic function 

1 

for all time t do 


2 

if digestionTimeAN D{a, 6, 

c} > 0 then 

3 

et et —i + k * 3 


4 

{a, 6, c} <— {a — k, b — k , 

c — k} 

5 

dt E- dt-i + k 


6 

end if 


7 

end for 



Psychology 

Individuals have their personality and emotional tempera¬ 
ment. This is defined with a three-layered psychological 
model, integrating: i) short-term emotions, which result 
from goal achievement; and a temperamental factor, com¬ 
bining ii) a long-term mood, which is the accumulated mem¬ 
ory of these emotions; and a Hi) biological imprint, which 
is a genetically determined component of the personality of 
the agent. Mehrabian’s PAD (Mehrabian, 1996) is used to 
represent these personality traits. PAD describes psycho¬ 
logical features as three-dimensional vectors of Pleasure, 
Arousal, and Dominance, where each dimension uses the 
bipolar space [-1,1]. 

Eq. 2 describes the biological component of personality, 
kl , k2 and k3 are constant values determined by the DNA- 
blueprint. 
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personality <— {/cl, /c2, /c3}, /cl, /c2, /c3 G [ — 1,1] (2) 

Eq. 3 describes the mood component at the time t , which 
is a combination of three vectors: the first with the previous 
mood at the time t-1 , a second with the last emotional state, 
and a third with the agent’s personality, a, /3, 7 are weight 
coefficients. 

mood t <— amood t -i + (demotion + 7 personality (3) 

PAD dimensions are also used to define the emotional 
vector. These are given from affective appraisals of the in¬ 
teractions, as described next. 

Reinforcement learning 

We have implemented the behaviors using a Markov chain 
Fig. 3. The probabilities associated with each state transition 
are updated after each transition of state, and more impor¬ 
tantly, after every interaction. Rewards of state-transitions 
have an affective dimension. They incorporate context de¬ 
pendent utilitarian and emotional components. The ap¬ 
praisal of a situation not only updates the weights of the 
transition states in the chain, but it also originates new emo¬ 
tions. When we calculate the reward of the state transition, 
we do it accessing its dimensions of Pleasure, Arousal, and 
Dominance. This appraisal gives origin to a vector with the 
current emotion. As we have described above, in Eq. 3 , 
emotions have a direct impact on the current mood. This, in 
turn, is also used to determine the outcome of interactions. 



Figure 3: Diagram of the Markov chain. Circles indicate 
states: SO-Rest; Si-Move to potential mate; S2-Move to 
potential prey; S3-Wander; S4-Found mate; S5-Previously 
known mate; S 6 -Next to an unknown mate; S7-Mate; S 8 - 
Found Prey; S9-Next to known prey; SlO-Next to an un¬ 
known prey; and SI 1-Attack. Arrows indicate state transi¬ 
tions. 


Interactions 

Agents interact based on their metabolic and reproductive 
needs. Sensors for the energetic deficit, hungriness, and sex¬ 
ual arousal drive the desire to connect with others. Interac¬ 
tions are then established based on the DNA definition of the 


classes of individuals. As such, when hungry, one individual 
will only search for potential preys, avoiding all the others 
individuals. In this context, potential preys are individuals 
that can provide the resources this one needs and are simul¬ 
taneously interested in the ones it can supply. The outcome 
of the interactions depends on a) the mood of both partners, 
and b) the utility for both internments of this particular in¬ 
teraction. 

Algorithm 2 describes one of the possible interactions, the 
transference of resources {a,b,c} from agent j to agent i, and 
the reciprocal transference of {d,e,f} from i to /, in a cooper¬ 
ative interaction of type ‘eat’ between the two agents, with 
energy e decremented by the associated cost k. a, (3 ,7 are 
weight coefficients. 


Algorithm 2 the eating interaction 

{a>i, bi , Ci} <- {ai + aja , bi + bj( 3 , a + Cj 7} 
2: {07, bj,Cj} <— {aj — aja : bj — bj/ 3 , Cj — Cjj} 
{dj,ej, fj} <- {dj + did, ej + e^e, fj + fi(} 
4: {di, e*, fi} <— {di - did, e» - e^, /* - /;C} 
ei i — e^—1 — k 
6 : ej i — ej—\ — k 


A video of the system running is available at https : 
//youtu . be/_W0KEz52Ksw. To further analyze quan¬ 
titatively the simulation we set it running for one hour and 
we have captured portraits of the population at every minute. 
The next section presents and discusses some of these re¬ 
sults. 

Results and Discussion 

We analyze the behavior of this population in quantitative 
terms. Our interest is on putting an emphasis in aspects pro¬ 
vided by techniques from the ALife’s framework previously 
identified in the State of the Art: a) the DNA blueprint, b) 
Reproduction, c) Metabolism, d) Psychology, and e) Rein¬ 
forcement Learning. 

We have departed from the framework presented by An- 
tunes (Antunes, 2013). This author presents a discussion on 
diversity and heterogeneity, from which we borrow some of 
the methodological tools. First and foremost we look at the 
DNA blueprint as a mechanism that allows control over the 
diversity of the population. Antunes suggests Shannon and 
Pielou indexes of diversity and evenness to measure the pop¬ 
ulation level of diverseness and entropy (Antunes, 2013). 


R 

H <— pi * ln(pi) (5) 

i= 1 


41 








In ecological studies, these indexes allow us to determine 
how distributed the species are (Mulder et al., 2004). For in¬ 
stance, consider a population composed of five foxes and 
one thousand rabbits. This community is far from even. 
Eq. 4 describes this relation, where J is the evenness fac¬ 
tor, ranging in the interval between 0 and 1; the higher the 
value obtained, the less variability there will be between the 
species. H denotes a number derived from the Shannon di¬ 
versity index and H max is the maximum value of H equal 
to InS (Mulder et al., 2004). Shannon’s index of diversity 
(Shannon, 1948) is used to measure a population’s hetero¬ 
geneity. Claude Shannon introduced this index to measure 
the entropy in strings of text. Eq. 5 describes Shannon’s in¬ 
dex, where the richness of biodiversity in a community H 
is a function of the total of individuals R and the proportion 
of individuals pi belonging to the ith species. The Shan¬ 
non index increases as both the richness and the evenness of 
the community increase. Typically, values are between 1.5 
and 3.5 in most ecological studies, and the index is rarely 
greater than 4. In our simulation, we have defined specia- 
tion according to the three binary digits of mating criteria in 
the DNA (eight possible species). The evenness of this run 
was 0.936. Shannon’s index was determined to be 1.946. 
These results indicate this population being rich, diverse and 
evenly distributed in genetic terms. 

This result points out that communities can have pheno¬ 
types evenly distributed and be relatively varied. The DNA 
technique seems to be useful in the framework of crowd sim¬ 
ulation’s developers as it introduces an easily implementable 
mechanism to define and establish hierarchical classes and 
relationships. 

The second aspect under scrutiny was the population den¬ 
sity. Fig. 4 shows the evolution with time of the population 
density. We can see a rapid increase of the population den¬ 
sity from the initial 50 to about 65 and then an even sharper 
boost to about 140. Then, numbers fluctuate within a varia¬ 
tion of about 20 individuals but with a tendentious decline. 

The fluctuations observed are an important aspect that we 
must retain with regards to simulations of humans since this 
mechanism introduces naturally occurring variations in the 
number of characters that are simultaneously present in a 
scene, a situation that is common in reality scenarios involv¬ 
ing humans. 

The third aspect under scrutiny was the spatial distribu¬ 
tion of the population, and the activities performed at each 
moment of time. Fig. 5 shows two distinct moments of the 
run. We took snapshots of the location in space of each of 
the individuals in the population individual at intervals of 
one minute each. We then overlapped these snapshots in in¬ 
tervals of ten minutes. Fig. 5-Top shows the first period of 
the run and Fig. 5-Bottom the second period. The graph in¬ 
dicates that the population moves, as expected, between the 
four interest points minimizing their effort using the shortest 
paths. However, far from a uniform distribution, the popula- 
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Figure 4: Illustration of the evolution over time of the popu¬ 
lation density. 


tion keeps changing spatially, shifting local attractors, thus 
occupying the space differently throughout the run. We can 
see clusters and new paths forming in different spots from 
image a and image b. 

This result is what we would expect from an ecosystem’s 
dynamics where these local attractors are emerging struc¬ 
tures from the self-organization of the individuals in their 
natural behavior and struggle. 

This graph also shows an interesting exploratory behav¬ 
ior, where a considerable number of agents move away from 
the aggregation areas (the shortest path between the inter¬ 
est points) diverging in the landscape. This pattern results 
probably from the learning algorithm which allows a dy¬ 
namic prioritization of goals. Again, this pattern is inter¬ 
esting from the human simulation point of view. Agents not 
only move between their goals, tendentiously following the 
less expensive emerging pathways, as they also explore their 
landscape and aggregate in dynamic clusters that form and 
reshape. These are patterns of behavior that developers of 
simulations of human behavior might take advantage in their 
animations. 

On the continuation of this spatial analysis, we found it 
relevant to also look at the actions performed. We have reg¬ 
istered what was each of the agents doing at the exact mo¬ 
ment of the snapshot. Figs. 6 depicts an arbitrary moment 
of time. We can see that agents were mostly moving from 
point A to point B. However, others were also engaged in 
interactions. These interactions differed in nature and oc¬ 
curred spatially disperse in the environment. 

We can also observe variation in the internal states when 
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Figure 5: The graph shows a juxtaposition of frames taken at 
intervals of one minute each, revealing a time-lapses of the 
spatial flow of the population over time. Top: First ten min¬ 
utes of the run; Bottom: Second interval of ten minutes. The 
red circles indicate attractor points where agents go when 
they have nothing else to do. The arrows in the bottom im¬ 
age point to areas where we noticed a significant change of 
occupation. 


interactions were of an identical type. Fig. 7-Top over¬ 
laps the interactions throughout the first ten minutes of the 
run. We found remarkable the level of exploratory behavior 
shown. However, it seems that individuals find beneficial the 
strategy of maintaining themselves in areas of high density 
where interactions tend to occur. 

Fig. 7 presents a summary of the activities throughout 
the run. As expected, agents have spent most of their time ( 
48% ) walking. The rest of the time, they were involved in 
interactions. 29% of their time they spent it in trading activ¬ 
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Figure 6: Graph showing a snapshot of the actions that 
agents are performing at one arbitrary moment of time. 




Figure 7: Actions being performed in the world. Summary 
of the run. 


ities ( Attack-Lose, Eat), and 24% mating ( Mating-Lose, 
Reproduce ). 34% of the time agents were successfully co¬ 
operating and only 19% they were recorded interacting with 
a non-cooperative attitude. We can justify the relatively high 
value of time spent involved in interactions with the limited 
number of goals that we have specified initially. 

We present the parameters used in the animations in Fig. 
8. We found large standard deviations in both the duration 
of the interactions and the maximum number of neighbors. 
These results confirm the expected high level of variation 
amongst the individuals, which was our initial goal. The 
variation of the personal space is much smaller compara¬ 
tively. We can justify this difference with the fact that we 
have restricted the boundaries of this parameter to maintain 
the animation within plausible parameters, with characters 
interacting in acceptable proximity. Even though, there is 
still a vast discrepancy in the population as we can see in 
the visualization of the personal space available in Fig. 8.2. 
The maximum speed was similarly tuned (restricted) for the 
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Figure 8: Interaction parameters. Left: Animation parame¬ 
ters (Duration of the interactions, Personal space, Maximum 
number of neighbors and Maximum speed); Right: Snapshot 
of the individual personal spaces at an arbitrary moment of 
time. 


animation sake, to avoid characters moving too fast. All 
these different levels of individual variation are important 
for group and multitude simulation since they provide mul¬ 
tiple degrees of adjustment of individual expression. Fol¬ 
lowing, we proceed with this discussion and will also make 
the final remarks to conclude the paper. 

Conclusions 

We have looked at techniques from the framework of AL- 
ife to study their impact on the simulation of populations of 
virtual humans. For this purpose, we have generated a pop¬ 
ulation of autonomous agents whose bio-inspired behavior 
was implemented drawing upon a set of techniques origi¬ 
nated from the framework of Alife. Agents are individually 
defined with a DNA-blueprint. They simulate a life cycle in¬ 
cluding death and reproduction, and their metabolism moti¬ 
vates them to search in permanence for new partners to trade 
useful energy or procreate. Agent’s behavior is defined us¬ 
ing a Markov chain, with dynamic probabilities updated us¬ 
ing intrinsic reinforcement learning. Learning is consequent 
from appraisals of the interactions of the agent that are both 
functional and emotive. We have defined a three-layered 
psychological model, integrating: i) short-term emotions, 
which result from goal achievement; and a temperamental 
factor, combining ii) a long-term mood, which is the accu¬ 
mulated memory of these emotions; and a iii) biological im¬ 
print, which is a genetically determined component of the 
personality of the agent. Mehrabian’s PAD is used to repre¬ 
sent these personality traits. 

The resulting population is composed of self-organizing 
social individuals that are autonomous and able to adapt and 
prioritize their goals and behaviors, expressing rich and var¬ 
ied behaviors that are relatively consistent and coherent with 
their past actions. They are capable of spontaneous interac¬ 
tions where personality and emotions play a relevant role. 


These interactions are biologically motivated, and its qual¬ 
ity is heavily dependent on these psychological traits as they 
impact their viability, outcome, and duration. Results of an¬ 
alyzing a run of this population seem to indicate that indeed, 
these techniques play important roles in improving a set of 
features that contribute raising the levels of realism of the 
simulations. 

We have identified a set of factors gaining direct bene¬ 
fits from the implementation of this framework: i) in the 
first place, spontaneity and heterogeneity are critical aspects 
when it comes to simulate human multitudes - we found that 
self-organization, motivated by the agent’s basal metabolic 
and sexual instincts, results in populations showing a high 
level of heterogeneous and spontaneous behaviors. What 
this entails for human simulation is the possibility of cre¬ 
ation of generative communities that have intrinsic motiva¬ 
tions to act and interact autonomously; ii) another aspect 
noteworthy of this study is the potential diversity entailed 
by this framework - human crowds are not homogeneous in 
shape and form, and the level of entropy in our simulation, 
caused by the inclusion of DNA-based reproduction indi¬ 
cates a raised level of differentiation amongst the individuals 
in the population. As mentioned earlier, this carries the ad¬ 
vantage of simplifying the definition of interaction relation¬ 
ships based on class and hierarchies; iii) the third aspect of 
observable benefits are the fluctuations of population density 
caused by birth and deaths leading to communities whose 
size and dimension varies over time; iv) finally, as a result 
from their differentiated psychology, we were also able to 
observe differences in the quality of socialization amongst 
the individuals, with nuanced variation in their spatial rela¬ 
tionship with their neighbors and the rest of the crowd. 

From the results above, it seems clear that ALife’s frame¬ 
work provides rich tools to be used in the context of human 
crowd simulators. This framework allows the construction 
of generative populations of intrinsically motivated agents 
that: i) self-organize and interact autonomously, ii) showing 
spontaneous and heterogeneous behaviors, with iii) a high 
level of diversity and individuality amongst its individuals. 

To conclude, we can say that the progressive presence of 
the bio-inspired techniques originated in ALife in the field 
of the human group, and crowd simulation is enriching these 
simulations in multiple and varied aspects that raise in some 
levels the plausibility of the animations produced. We can¬ 
not ignore this growing interest, and this fact has justified the 
pertinence of a study on the impact of this framework in that 
type of work. Our contribution fits this need, and in this pa¬ 
per, we have identified and studied some of these structural 
techniques helping to understand their impact and benefits. 
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Abstract 

Detection and analysis of collective behavior in natural and 
artificial systems is a difficult task which is commonly dele¬ 
gated to a human observer. We present a statistical frame¬ 
work to automatically detect emergent, collective behavior 
of agents in agent based simulations which exhibit swarming 
and flocking behavior. Our tunable, transitional-, rotational- 
, and scale- invariant framework - geometry of behavioral 
spaces - identifies common behaviors among agents and 
translates these behaviors into a system’s behavioral primi¬ 
tives, along with the agent transitions from one behavioral 
primitive to another. Finally, we use complex network anal¬ 
ysis to detect collectives of agents that gravitate into a com¬ 
mon cluster of behavioral primitives as the system’s emergent 
behavior condenses or decays. We apply complex network 
theory to the analysis of collective behavior dynamics in the 
simulations of flocking and swarming to validate our analysis. 
Our framework does not use the knowledge of the parameter 
space that drive the models, and only relies on the temporal 
agent trajectories of exhibited behavior. The utility of detect¬ 
ing emergence from exhibited behavior makes this technique 
suitable as a fitness function for stochastic search algorithms, 
analyzing evolutionary dynamics of systems with collective 
behaviors, detecting structures in artificial chemistry experi¬ 
ments, or analyzing physical system such as bacterial forma¬ 
tions. 

Introduction 

Agent Based Models (ABMs) are widely used to study and 
model collective phenomena in natural and artificial sys¬ 
tems. However, there is no universal methodology for how 
to construct or analyze these systems. The solutions that are 
found by a group of cooperating agents that only use local 
agent-to-agent interactions to problems defined on a system- 
wide level such as foraging for food, designing grassroots 
movements in societies, or escaping a predator, can fre¬ 
quently only be understood by a human observer. Designing 
ABMs with cooperative behavior to solve problems is diffi¬ 
cult since it required bridging the scope between the design 
of individual interaction rules and the system-wide problem 
definition.We present the results of applying previously de¬ 
veloped statistically based computational framework to de¬ 
tect the onset and dissipation of collective behavior in a sys¬ 
tem using only displayed system dynamics - without any 


knowledge of the parameter space that controls the system 
or the laws that drive the system’s components. Our work is 
focused on analyzing agent based models, although frame¬ 
work can be used in both the analysis of emergent system 
behavior and system engineering. 

In nature, flocking and swarming are examples of the 
system-wide behaviors used by organisms to interact with 
their environment as a collective rather than as individuals. 
Many natural and artificial systems with collective dynamics 
share common characteristics of how they are constructed: 
they are composed of simple agents that interact with each 
other using simple interaction rules, yet the sophisticated 
collective dynamics of the ensemble results in the system’s 
evolutionary advantage, survival, or a task completion. We 
use the flocking and swarming ABM models behavior to test 
our framework’s ability to (1) generalize agent’s “noisy” be¬ 
haviors into common behavior groups and (2) differentiate 
between the model behaviors that only differ by the velocity 
and shape of the agent ensemble. 

Stochastic search algorithms are commonly used to auto¬ 
matically design the interaction rules of the ABM’s agents. 
These search tools rely on a fitness function to measure the 
quality of system behavior to solve a problem. Building a 
fitness function that includes the level of cooperation among 
agents that leads to desired goal requires significant human 
expertise and (if successful) results in a custom-tailored, sin¬ 
gle purpose function that is hard to generalize for ABM de¬ 
sign for different environment. Simple fitness functions, on 
the other hand, often only measure if the simulation reached 
or made progress towards a goal - regardless of how the goal 
was reached. It is not guaranteed that the resulting system 
solves a problem using emergent system behavior. Even if 
the collective behavior is present in the system dynamics it is 
usually by coincidence and not by design, which is a signifi¬ 
cant source of error when modelling real-world systems. We 
show the utility of our framework in measuring the amount 
and direction of collective behavior in a system, which can 
be a useful component of a fitness function design to reward 
the ABMs with cooperative behavior. 

Our analytical framework measures both the quality and 
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quantity of cooperation among agents. Our analytical tool 
functions independently of the scale and movement of agent 
behaviors, is applicable to multiple models and is applicable 
to different optimization techniques. The bee-hive optimiza¬ 
tion, particle optimization, swarm robotics are examples of 
additional techniques developed to solve problems using 
collective, hierarchical system organization. Although these 
tools share common principles, there is a lack of tools to 
analyze system behavior for different optimization tools on 
varying granularity of analysis. Since our analysis only uses 
the agent behaviors without any knowledge of the model’s 
parameters, the framework is independent of the optimiza¬ 
tion techniques. This allows for a computational comparison 
of not only different model executions, but also comparison 
of results found by different optimization techniques. 

The presented results illustrate both the ability to gener¬ 
alize stochastic agent behaviors in the system dynamics into 
common patterns of behavior and differentiate between two 
very close collective behaviors of flocking and swarming. 
Plotting the counts of agents in each prototypical behavior at 
each time-step of the model’s execution allows us to predict 
the phase transition in the system dynamics from collective 
to dis-associative and vice-versa. Inspecting the temporal 
dimension of the behavioral spaces allows us to explicitly 
measure the level of self-organization, emergence and the 
directionality of the system dynamics: towards the conden¬ 
sation or decay. 

The fitness function design for the stochastic search al¬ 
gorithms to evolve agent systems that solve problems us¬ 
ing emergence and analysis of ABM dynamics to analyze 
the collective behaviors in an ABM model are only two ap¬ 
plications of the the geometry of behavioral spaces frame¬ 
work. The framework can also be used to explore the multi¬ 
parameter spaces that drive various models, fine tune models 
to increase performance, or identify a set of high diversity 
solutions by identifying how they solved a problem. With 
alternations, the framework can be used as a general pattern 
finding engine regardless of application. 

Background and Motivation 

The hierarchical system decomposition (HSD) (20), pattern 
oriented modeling (POM) (4) and morphogenetic engineer¬ 
ing (MGE) (3) are common methods used to build systems 
of agents that solve problems. The HSD is a reductionist 
approach where the system is built by addition of the con¬ 
stituent components and fails to construct systems with non¬ 
linear dynamics. The POM requires a spatially explicit land¬ 
scape for its agents to move across and interact with. The 
spatial nature of these models allows for the use of spatial 
statistics to detect local, emergent behavior, but fails to iden¬ 
tify the emergent behavior on the global scale. The MGE is 
the most sophisticated and relies on “expressing” the sys¬ 
tem’s construction to measure its quality. For example, the 
rule-based swarms and graph-based grammars describe the 


system construction and self-organization in computational 
development swarms (10; 18). The solution’s quality is mea¬ 
sured as system’s ability to perform its task or the difference 
between the expressed versus desired patterns (14). The 
POM and MGE made significant progress towards develop¬ 
ing quantifiable ABM models. The general tools to analyze 
the dynamics of these models lag behind. 

Analyzing the non-linear system dynamics has been at 
the forefront of science for years with a few efforts to an¬ 
alyze the dynamics of the individual and agent based mod¬ 
els. Miller used a system of non-linear equations to describe 
the ABM’s dynamics to measure the effect of changing the 
model’s parameter configurations to the resulting system 
dynamics (11). A summary of statistical and mathemati¬ 
cal tools to describe the collective dynamics of swarm and 
multi-agent systems was presented by Lerman et al. (7; 8). 
The pragmatic efforts of Calvez and Stonedahl resulted 
in extension and implementation of automatic stochastic 
search tools to explore the parameter search-spaces of ABM 
configurations (1; 15). Controlling the vehicle swarms us¬ 
ing physics-based expert systems was proposed by Spears et 
al. (13) while Miner et al. proposed Markov processes to 
characterize multi-agent behaviors (12). 

The individual and system wide dynamics were analyzed 
using the information theoretic tools by: Van et al. who 
studied the expected agent behaviors in the KuglerTurvey’s 
ant colony model (6; 17), Lizier et al. used Shannon based 
entropy to analyze the micro- and macro-level agent dynam¬ 
ics (9), and the behavior of the swarm robots was analyzed 
by Wang et al. and Lizier et al. also using the information 
theoretic tools (9; 19). 

The geometry of behavioral spaces framework comple¬ 
ments previous research by providing a problem domain in¬ 
dependent analysis with the following features: the ability to 
detect emergent processes from the agent behaviors, produc¬ 
ing heuristics to tune the system behavior, the ability to filter 
agent behaviors of varying frequency and visualize the net¬ 
work of behavioral primitives for both high level and detail 
inspection of the model’s constituent behaviors. One fea¬ 
ture of our framework that is not addressed by the previous 
work is the ability to provide a real-time system monitoring 
to measure the velocity and direction of the system’s con¬ 
densation or decay towards organized behavior. 

Methodology 

Geometry of behavioral spaces framework is a multi-step 
process that analyzes recorded agent trajectories for com¬ 
mon patterns of behavior. First, each agent in the model 
records the direction of its movement then the statistics and 
similarities of agent behaviors at each time step are com¬ 
puted. The second scan of the recorded behaviors is used to 
construct a behavioral transition state-space and the distri¬ 
bution of how many agents at each time steps were in which 
behavioral primitive. 
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Figure 1: Geometry of behavioral spaces methodology: (a) recording of agent trajectories in four cardinal directions, (right) 
extracting past and future behaviors S~ and S + at the first six time-steps, (b) compression of extracted histories into a vector, (c) 
recoding the co-occurrences of encoded past and future histories into the matrix C, (d) clustering similar rows into behavioral 
primitives, (e) the behavioral primitive look-up for two consecutive past histories during the second scan of the agent’s trajectory 
vector and (f) recording the behavioral transitions in two subsequent steps creating the state-space transition matrix T 


The following sections will describe the details of each 
analytical step: simulation space discretization and recoding 
of agent trajectories (Figure 1 a), at each time step agent’s 
past and future windows of behavior are first compressed 
and then logged into the co-occurrence matrix (Figure 1 b, 

c) , then the groups of common past behaviors that have the 
same or similar enough future behaviors are created and de¬ 
fine the behavioral primitives of stable behaviors (Figure 1 

d) , and finally, the behavioral space is constructed to re¬ 
veal the system-wide transitions among behavioral primi¬ 
tives along with the likelihood of agents to transition from 
one behavioral primitive to another stable pattern of behav¬ 
ior (Figure 1 e,f). The final analysis of the behavioral space 
is used to differentiate if the system dynamics are condens¬ 
ing into the flocking versus swarming behavior or if the sys¬ 
tem behavior is decaying into a random agent behavior. The 
complex network analysis is showed in the results section 
(Figure 3 and 6). 


Data Collection and Space Discretization 

In order to report agent trajectories, every agent records its 
initial position and compares that position to its current posi¬ 
tion. When the distance travelled exceeds a threshold value, 
the agent resets its initial position to its current position, re¬ 
ports the general direction it traveled, in the case a value 
between 1 and 4, and repeats this process. In this manner 
high-frequency (small-scale, repetitive) behaviors are not 
recorded, as they fall below the threshold for distance trav¬ 
elled from the previous report, whereas large scale move¬ 
ments are more frequently reported because these agents 
move more quickly and therefore report more often. Using 
this approach, we can ensure that every agent experiencing 
the same behavior reports in the same manner, independent 
of the agent’s spatial disposition within the model. 

At the end of the simulation, each agents trajectory is a 
vector of directions that the agent moved with values ranging 
from one to four, the time-stamp and agent’s identifier. At 
any time-step t , the sequence of S~ previous moves is called 
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agent’s past history and the $ + subsequent moves is agent’s 
future history. 

Co-occurrence matrix M 

Each agent’s past and future histories at each time step t are 
compressed into a vector of length four, where each vector 
position (attribute) has the number of times the history con¬ 
tained the move in a given direction (Equation 1). The num¬ 
ber of cardinal directions to record agent’s movement can 
vary to yield higher resolution analysis or to reflect the dif¬ 
ferent space tessellations. In this paper, we use four cardinal 
direction. 


v(J-) = (E(l G S~), E(2 G S~), E(3 G 5~), E(4 G S~)) 
v(6+) = (E(l G <S+),E(2 G 6+), S(3 G 6+), E(4 G 6+)) 
( 1 ) 

The co-occurences of encoded past and future behaviors 
v(S~) and v(5 + ) are logged into C[v(S~ t ),v(5^ t )] for all 
time steps k (Equation 2). 


C[vtf"),«(<*+)] = S Vfc («(<y- + J : v(5+ +k ))(2) 

Behavioral Primitives 

Each row of the co-occurrence matrix C is a likelihood dis¬ 
tribution of a given past behavior resulting in any of the 
observed future behaviors. The next steps, we group all 
such rows that are similar to each other, creating clusters of 
past behaviors that resulted in sufficiently similar future be¬ 
haviors to be considered a behavioral cluster - a behavioral 
primitive e (Equation 3). We used the x 2 test of statistical in¬ 
dependence between any two rows that were not previously 
assigned to a behavioral primitive to measure the similarity 
between two row distributions of past behaviors. Other tests 
can be used to calculate this similarity. 


e* = if (x 2 (C[i, :], C\j, :]) < a) then e t U C[j, :] (3) 

Behavioral Space T 

The behavioral matrix T eXe describes the high-level agent 
behaviors and the transition of agent’s behavior from one 
stable behavior to another. To construct the matrix T, we 
scanned the agent trajectories for a second time. For each 
agent, we look up which behavioral primitive agent’s past 
behavior belongs to at two consecutive time steps t and t- hi 
p = v(S~) and r = v(S~ +i ) respectively. We record the 
agent transitions between two behavioral primitives into a 
matrix T\p, r] - a state-space transitional matrix of exhibited 
behaviors in the simulation (Equation 4). 


T[p,r] = £ Vfc (e p : (v(6 Xt ) e e p ) , e r : (v(5 Xt+ J G e r ))(4) 



Figure 2: A complex network visualization of the final be¬ 
havioral matrix T for swarming (top) and flocking (bottom) 
models with the system behavior varied in 400 time-steps 
from cooperative to random. The communities of highly 
connected nodes were further clustered into communities 
and colored with the same color. 

Geometry of Behavioral Spaces 

The final step is a complex network analysis of the ma¬ 
trix T, with the behavioral primitives being the network’s 
nodes and the behavioral transitions are the network’s edges. 
We further clustered the tightly coupled behavioral primi¬ 
tives into the communities of closely related behaviors, fil¬ 
tered transition edges with low edge weight, and used a net¬ 
work layout to visualize the behavioral space (5). For ad¬ 
ditional details of the behavioral spaces analysis, please see 
(Cenek and Dahl, in press). 

To construct a dynamic view of the agent behaviors dur¬ 
ing the simulation, tracking the progress towards the con¬ 
densation or dissipation of collective behavior, we counted 
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Figure 3: Agent counts in each behavioral primitive at every time-step of the swarming (left) and flocking (right) model 
executions with the oscillation period between condensed cooperative and decayed random behaviors every 400 time-steps. At 
each time-step a bar chart shows the top 25 behavioral primitives with highest counts of agents that were selected (globally) at 
the end of the simulation. Each color of in the bar represents one active behavioral primitive and its size is proportional to the 
number of agents with that behavior at that time-step. The top of each plot shows a miniature snapshot of the active transition 
edges of the behavioral matrix T as a complex network 200 time-step intervals. 


how many agents, at any given time step, are at which be¬ 
havioral primitive. Figure 4 shows the histogram of the top 
25 most frequent behavioral primitives and the same aspect 
is showed in Figure 2 where the network nodes represent¬ 
ing each behavioral primitive have their size proportional to 
the count of agents in that behavior (the node’s weighted de¬ 
gree). 

Results 

We applied the geometry of behavioral spaces framework to 
the flocking and swarming models with global co-operative 
behavior among agents (16). To illustrate the framework’s 
ability to detect the regime changes between the coopera¬ 
tive and random system behaviors, we varied the model pa¬ 
rameters that control agent’s ability to coordinate with other 
agents (alignment, vision, radius etc.) every 200 or 400 
time-steps to force the cooperative system behavior or its 
dissipation into a random, dis-associative behavior. 

The models ran for between 1500 and 1900 time-steps 
which allowed for 7 and 2 regime changes at 200 and 400 
time-step periods respectively. The system dynamics were 
analyzed using the past and future history vectors of 15 
components long, four cardinal direction of reporting agent 
movement, and the threshold a = 0.05 for x 2 measure to 
group the rows of co-occurrence matrix C into the behav¬ 
ioral primitives. 

The complex network view of the behavioral space only 
shows the behavioral primitives of the giant component with 
the degree > 0 and the minimum edge weight of 10. Figure 
2 shows the global behavioral landscape of for two models. 


The network’s node and edge size is proportional to their 
weighted degree and weight. 

The node clusters with the same color identify the highly 
connected behavioral primitives (5). These communities 
of behavioral primitives show common transitions among 
agent behaviors. For example, the behavioral landscape of 
the swarming model (Figure 2 top) has the behavioral prim¬ 
itives of condensed cooperative behavior organized as the 
center cluster of nodes. The peripheral behavioral clusters 
represent the agent behaviors after system decayed into ran¬ 
dom behavior. The communities of behavioral primitives in 
the flocking model (bottom) are interconnected, since the 
model’s condensed emergent system behavior has several 
small groups of agents that move in one of eight general¬ 
ized directions. Each direction is seen as one community of 
nodes with the same color. After the decay of global cooper¬ 
ative behavior, the agents trajectories slowly diverge and in¬ 
creasing number of behavioral primitives are activated. This 
change of behavior can be seen in the time-slice plots of the 
active behavioral spaces as the miniature sub-networks (top) 
in Figures 3 (please note the coloring and layout for these 
miniature graphs is different than in Figure 2. A window 
of 20 time-steps was used to show the active edges of the 
behavioral space at each time-slice (Figure 3 top). 

Analysis and Discussion 

All system measures presented in our result plots are quan¬ 
titative measures of system dynamics and can be used to an¬ 
alyze a system’s emergent behavior, to be included as a pa¬ 
rameter in the fitness function of the stochastic search algo- 
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Figure 4: Counts of how many agent behaviors were 
recorded in each behavioral primitive during the models ex¬ 
ecution. The x-axis shows all possible behavioral primitives 
sorted in the alphabetical order from < 0,0,0,15 >—)>< 
15,0,0,0 >. 


rithms, or to identify the high diversity solutions by differen¬ 
tiating the indifferent solutions. Additionally, this simplified 
and descretized data set is entirely contained in a behavioral 
matrix, allowing the model to be compared with other mod¬ 
els. Machine learning can also be used in order to name be¬ 
haviors or translate this raw data into human form, although 
this is also simple to do manually. 

The behavioral primitive histogram (Figure 3, 4) and the 
global behavioral primitives count plots (Figure 5) are the 
fingerprints of the system’s dynamics. The histograms of ac¬ 
tive behavioral primitives shows regime shift as the change 
of distributions (and counts) of agent behaviors. The blue 
color on the bottom of the plot is the most frequent behav¬ 
ioral primitive and represents the ’’random” behavior. As 
the system behavior decays, the number of agents with ran¬ 
dom behavior increases, so do the counts of active behav¬ 
ioral primitives. 

The qualitative analysis of the system’s behavior is 
done by construction of the behavioral landscape networks 
showed in Figure 2. Note that although the framework’s pa¬ 
rameters are tunable, we ran the analysis ’’straight out of the 
box” without fine tuning any of the analytic framework’s pa¬ 
rameters to identify different behavioral features from the 
system’s dynamics 


We executed the model with different random seed, but 
the analytical framework’s results were stable with little 
variation in the reported dynamics. We do not report the 
results of the statistical validation and variance of the ana¬ 
lytical framework. The stability of the analysis to generalize 
the system dynamics can be seen as highly correlated agent 
counts in each behavioral primitive in the series Swarm- 
ing200 vs. Swarming400 and Flocking200 vs. Flocking400 
showed in Figure 5. 

Note that the ’’random” behavior is different in each of 
the models. In the flocking model, the disordered regime 
constitutes the agents slowly drifting apart at random which 
means they dissociate slowly from their ordered flocking and 
never reach a fully random state. In the swarming model, the 
random behavior is agents moving in completely random di¬ 
rections almost instantaneously after the model parameters 
were changed. The regime shifts are showed in the Figure 
3, as the sudden increase in the counts of behavioral prim¬ 
itives between the steps 400 and 450. This trend parallels 
the increase in the system behavior diversity. The difference 
between how the two systems decay to random behavior is 
the difference between the sudden increase of in the behav¬ 
ioral primitives counts in the swarming model (left) versus 
the gradual decay (the gradual increase) in the flocking be¬ 
havior (right). Behavioral primitives histogram in Figures 3 
and 4 show the same dynamics features as the time series 
distributions in Figure 6. They both can be used to detect 
when system dynamics are condensing towards stable, co¬ 
operative behavior and dissipating into random behavior. 

The ability to generalize the stochastic agent behaviors 
into stable behavioral primitives can be seen in all histogram 
figures (Figures 3, 4) and the system dynamics measured 
on the state-space transition networks (Figure 6). The os¬ 
cillation of model’s behavior between the cooperative and 
random regimes resulted in the histogram’s oscillations, but 
more importantly, the framework repeatedly generalized the 
agent behaviors to the same behavioral primitives. This 
is seen as the same color pattern and structure in the his¬ 
togram’s periods. The framework also characterized the 
agent behaviors to the same behavioral primitives in the in¬ 
dependent model executions with the regime changes every 
200 or 400. The histogram pattern in the Figure 4 is close 
to identical to the histogram in Figure 3 if the time axis was 
compressed. 

Figure 6 shows the same observation about the frame¬ 
work’s stability to generalize agents’ random behaviors. Re¬ 
gardless of the oscillation period between the cooperative 
and random regimes in different model executions, or the 
repeated condensation and decay of behaviors within the 
same model execution, the shape difference between a pair 
of model’s executions is minuscule. The difference in off¬ 
set between the 200 and 400 plot-lines is because the latter 
model execution lasted 500 time-steps longer and resulted in 
higher agent counts in each behavioral primitive. 
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Figure 5: Counts of how many agent behaviors were 
recorded in each behavioral primitive during the model’s ex¬ 
ecution. The x-axis shows all possible behavioral primitives 
sorted in the alphabetical order from < 0,0,0,15 >—>< 
15,0,0,0 >. 


All analytical measures applied to the resulting geome¬ 
try of behavioral spaces on the system dynamics of flock¬ 
ing and swarming models clearly show the system’s con¬ 
densation towards cooperative behavior and subsequent dis¬ 
sipated towards random behavior. In this context we used 
the term ’’geometry” to refer to the structural features calcu¬ 
lated using the complex network analysis. Figure 6 shows 
the overview of the network’s measures that can be used to 
identify system’s dynamics towards a regime shift. A sud¬ 
den increase or decrease in the measure’s slope indicate the 
direction and the velocity of system’s impending behavior 
regime shift towards condensation or dissipation. A stable 
system behavior (cooperative or random) will have the mea¬ 
sured slope near zero. 

Stochastic search algorithms are popular tools that do not 
need to know how to find a solution, only what needs to 
be solved and if the candidate solution made a progress to¬ 
wards a desired goal. As explained earlier, designing a fit¬ 
ness function that drives the automatic searches should be 
inclusive of different aspects of evolved system dynamics. 
In this case, reward the solutions that (1) use cooperative be¬ 
havior among agents to solve a problem and (2) are different 
than the rest of the solutions. If a stochastic search function 
found multiple methods of solving a given problem, the final 
state-space transition matrices and the complex measures of 
the geometry of the behavioral spaces are one way of differ¬ 
entiating how different candidate solutions solved the prob¬ 
lem. A correlation measure between two system measures 
in Figure 6 will reveal how different the system dynamics 
are from each other. This allows for rewarding the solutions 
with different system dynamics than the rest of the solutions 
that also solved the problem, creating a metric for originality 
in computer models and a mechanism for creative problem 
solving. 

In the future we plan to test our methodology on a broader 



time time 


Figure 6: The complex network analysis of active behavioral 
spaces at each time-step of the model’s execution. Exam¬ 
ples of the active sub-networks are showed as the miniature 
graphs in Figure 3. The figure shows the average clustering 
coefficient (top row) and degree/edge distributions (bottom 
row) over time for the swarming (left column) and flocking 
(right column) models with varying condensation and dissi¬ 
pation of system behavior every 400 time-steps. 


range of models with emergent system behavior and focus 
on evolving agent rules using a fitness function that incor¬ 
porates the measure of cooperation among system agents 
to achieve the simulation goal. Outside of the ABM appli¬ 
cation, we plan on using the methodology to analyze the 
recorded trajectories of bacteria from a high-speed, high- 
power video feed to automatically detect the formation of 
bacterial colonies. Using the framework’s strength as a gen¬ 
eral pattern finder, we hope to apply our methodology to de¬ 
tect the emerging signatures in cyber-security intrusion at¬ 
tacks. 
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Abstract 

Throughout history, whenever new technologies have 
emerged that change our means of production and ability to 
communicate they have tended to transform society. 
Spearheaded by digitization, followed by emerging living and 
intelligent technologies, our world is currently being 
transformed into something we have difficulty imagining. The 
transition is likely similar in scale to what we experienced 
moving from an agriculturally based society to the industrial 
society, although it will occur at a much faster pace. I present 
key qualities of our emerging societal transition, discuss 
underpinning scientific issues, and propose a way the scientific 
community could engage. Finally, I document how part the 
European Commission, the Danish Parliament and Press as 
well as interested stakeholders engage (or not) in the process 
of creating a “brave new world”. 

The postindustrial world 

Our political institutions, the rule of law, human rights, the 
banking system, our education system - and even capitalism as 
we know it - are mainly a products of the industrial age. Using 
narratives from the industrial age we have learned to navigate 
the industrial economy as individuals, and as societies we can 
exert some control to define its shape and limits. But what 
comes next, in a postindustrial world? 1 Even in the past decade, 
digital products and services, the internet and mobile 
technology have changed our lives. This is mainly the result of 
accumulated advances over the past 50 years and there is much 
more to come. For example, recent studies (Frey and Osborne, 
2013) 2 indicate that digitization is likely to replace about half 
of known job functions within 20 years. 

Thanks to automation, only a small percentage of the 
population will be needed to produce and distribute what 
everybody needs. For example, today less than 3% of the 
Danish population is engaged in agriculture and fishery 3 , down 
from almost everybody some 150 years ago - and these 3% can 
feed several countries the size of Denmark. As technology 
becomes more life-like (Rasmussen et al., 2011) 4 more 
components can be recycled, in the same way that materials are 
within biological systems. With the development of personal 
fabricators (Girshenfeld, 2003 5 and Packard et al., 20 1 0 5 ) - 
super-advanced 3D-printers - it’s likely citizens will be able to 
design, share, manufacture and recycle pretty much everything 
they need locally. 

These new technologies are likely to lead to big changes in 
society, and these could be as drastic as the differences between 
the Stone Age and the Bronze Age, or between the agricultural 
society and the scientific age of industry. Inevitably, such a 


shift leads to changes in economic and political systems, 
national sovereignty, balances of power, the environment, the 
human condition, even religion. But this time the changes will 
not take place over hundreds of years, but within a generation 
or so. These changes hold promises for amazing possibilities as 
well as grand challenges. 

The BINC Manifesto 

Because of these ongoing changes, part of the scientific 
community is in the process of assembling a so-called BINC 
Manifesto 7 named after the key converging technologies that 
shape the ongoing changes: the bio-, info-, nano- and cogno 
(BINC) technologies. The BINC Manifesto calls scientists and 
interested stakeholders to action to identify and document 
observables, trends, mechanisms and key issues concerning the 
emerging mainly technology driven societal transition. (1) The 
primary mission is to find out how things are (the facts). (2) 
Secondary - and separately from (1) - we as citizens and 
scientists may propose possible scenarios for how to develop 
our new postindustrial societies. 

The Manifesto is organized around five cross-cutting issues, 
each formulated as a list of scientific conjectures that can be 
falsified or verified: 

A) How is the digital economy different from the industrial? 
Al) Digital products and services represent an increasing part 
of the value creation. 

A2) Only the first digital unit requires capital, land and labor, 
the following copies are practically free of cost. This means 
profit without production and less need for employees. Further, 
digital products have no transportation costs, they are global 
from the moment they are released, so the best products win 
and take all: there is no market for the second best. However, 
the threshold is also lower to enter (a fair) market. 

A3) The marginal costs of material production approach zero 
(0) due to automation. 

A4) With Personal Fabrication pretty much anything can be 
manufactured locally (open source software and hardware). 
A5) An increasing part of the economy is based on derivative 
trading (speculation). 

A6) Established economic theories are inadequate to address 
the current reality. 

B) Citizens in cyberspace and citizens as biological creatures 
Bl) Information and communication technology (ICT) design 
and infrastructure implementation cements power structures 
(central or decentralized). 
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B2) Currently, ICT is mostly implemented with a resulting 
greater concentration of power (government, communication, 
banking, platforms for social media). 

B3) Big business, governments and international intelligence 
use the digital infrastructure to access private data from the 
citizens, which means loss of freedom and power for the 
citizens. 

B4) Massive control of the information flows and the 
associated perception enables modeling of individuals’ 
decision processes and value chains, which in part determines 
what it means to be human. 

B5) Synthetic biology (SB) increasingly makes it possible to 
alter our genetic makeup. These technologies in a very direct 
manner have the potential to impact what it means to be human. 
B6) ICT and SB generate a significantly more complex world. 

C) In the developed economies the middle class and 
democracy is eroding 

Cl) There is greater return on investment in speculation than in 
production, you become more wealthy from being rich than 
from working. 

C2) Information, humans and money can travel freely across 
national borders. On the individual level, everybody 
increasingly participates in one global job market. 

C3) Businesses move to places where they don’t need to pay 
taxes. Nation states compete among each other to provide the 
lowest taxes. 

C4) The middle class is increasingly challenged to provide the 
tax revenue needed to secure good nation state governance. 
C5) Economic elites are taking over the political power and 
democracy is deteriorating. Economic and hence political 
power gets concentrated on still fewer hands. 

C6) Fair markets are manipulated by search engine algorithms 
when they have monopoly. 

C7) Elections can be manipulated by search engine algorithms. 

D) The global interconnectedness also means global 
interdependency and that no nation state can take care of 
their citizens alone. 

Dl) We have entered the Anthropocene. There is only one 
environment, e.g. local consumption generates global warming 
and human impact is causing a mass extinction of species. 

D2) The converging bio-, info-, nano- and cogno- (BINC) 
technologies, which are developed everywhere, will transform 
the world faster than ever before into something we have 
difficulty imagining as the pace of these new inventions 
increases exponentially. 

D3) The global population continues to grow and is predicted 
to reach 9 billion around 2050, accompanied by a wide range 
of migration issues and cultural clashes. Also the emergence of, 
and migration to, mega-cities have created new local-global 
communities. 

D4) It becomes increasingly challenging to align radically 
different economic, cultural and governance structures, i.e. 
traditional Arabic, industrial Russian and digitized 
Scandinavian as individuals from previously distant cultures 
are now mixing. 

D5) There are no global institutions in place that can handle 
this transformation, nor do we have the necessary legal 
frameworks and theories. 


E) The need for new narratives. 

El) The political spectrum of left and right used to be about 
capital interests. Left and Right emerged from the industrial 
society and the struggle for power between workers and capital. 
Today, increasingly the largest corporations don’t own 
production capital in the traditional sense (e.g. Google, 
Facebook, AirB&B, Uber, Amazone) and our pensions 
depends on the performance of the stock market. Today, in the 
West, from left to right there is consensus about the open 
society, liberal democracy, market economy, and some 
measure of public welfare (disagreement on level of taxation 
and social benefits), but overall, systemic agreement about the 
model. As a result, voters are uniting along different lines e.g.: 
globalization; defense; cyberspace privacy; sustainability; new 
public management; balance of cities and countryside; financial 
sector regulation. 

E2) Postmodern deconstruction, globalization and the above 
mentioned erosion previous narratives are undermining our 
grand narratives about reality, which used to keep societies 
together, i.e. religion, nation and class - and to some extend also 
science. As they lose their explanatory power, some are re¬ 
discovered in totalitarian form. 

E3) The only grand narrative that has survived is “the free 
market”, which provides consumer goods efficiently but is 
incapable of solving any of the problems stated above. In fact, 
it fuels them. 


Science, policy and stakeholder engagement 

Experiences from national and international science and 
technology advisor activities regarding these issues are 
documented through interviews (film), text and policy 
initiatives. Further, experiences are discussed regarding 
stakeholder and citizen engagement. Finally, as an example, it 
is documented and discussed how and why part of the Danish 
Parliament and the Press regrettably have detached themselves 
from part of reality and now live in a post-factual subculture 7 
with respect to the impending societal impact of technology. 

Acknowledgements. The Manifesto text 8 has been through several 
iterations mainly influenced by Lene Andersen, the working groups 
from the Lorentz Center workshop 9 , discussions at the Santa Fe 
Institute, citizen discussions as well as policymaker meetings. I am 
greatly indebted to the many people involved in this process. 
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Abstract 

Software-based artficial life will increase the robustness, and 
enable vastly increased size, of computing systems. To en¬ 
hance human potential and protect individual liberty in fu¬ 
ture society-scale systems, the boundary between ‘private’ 
and ‘public’ digital spaces—known in telephone networks 
as a demarcation point or “demarc”—should be set so that 
a significant amount of physical computing machinery can 
be counted as fundamentally personal , for assigning rights 
and responsibilities. To that end, this note offers a princi¬ 
ple called the carried network demarc : The machines that 
you routinely carry under your own power, and their con¬ 
tents and interactions, should be considered part of your 
body as a matter of law and social norm. Such machines 
today may be as prosaic as a watch, pacemaker, or cellphone, 
but in the future you may regularly carry machines inhabited 
by multitudes of beneficial alife creatures—akin to the bac¬ 
terial microbiomes that surround and perfuse our biological 
bodies—that would likewise be considered you and yours in 
both their physical and computational aspects. The author 
solicits input from others with expertise bearing on this topic. 

Physical and computational convergence 

In the ubiquitous modest-sized computers of today, the ‘ran¬ 
dom access memory’ organization makes the physical loca¬ 
tion of the hardware components largely irrelevant to ma¬ 
chine operation. But in any sufficiently large computational 
system—for example as envisioned using indefinitely scal¬ 
able computer architectures (Ackley, 2013)—actual phys¬ 
ical distances and computational or communications dis¬ 
tances are inherently coupled by the speed of light. In 
such fundamentally spatial computers (Beal et al., 2012, 
e.g.), physically close components are inevitably faster and 
cheaper to access than remote ones, and they are more likely 
to share fate under the large and small vagaries of reality. 

Existing location-free concepts of computation like “cy¬ 
berspace” and the “cloud” not only fail to capture but 
actively obscure the physicality of computation, with the 
often-overlooked consequence that the “computation” and 
the “user” are imagined to exist, somehow, in utterly unre¬ 
lated spaces. We argue that view is not only manifestly false 
but also insidiously dangerous—and the carried network de¬ 
marc proposal, in part, attempts to reframe it. 


The idea that a human “self” is physically identical 
with its natural “meat” body is certainly obvious, but to 
apply that notion uncritically in future converged physi¬ 
cal/computational environments would put the individual 
human at a crippling disadvantage. Such a view, by de¬ 
fault, would expect the human to attend to tasks that artificial 
entities will routinely delegate to other artificial entities— 
not just high-level information-processing jobs like sorting 
email and other interruptions, but also far more fundamen¬ 
tal and autonomic tasks like maintaining location awareness 
and performing continuous threat and opportunity assess¬ 
ment within one’s physical/computational surroundings. 

We should expect such low-level processing to be pro¬ 
tected by limits stronger than just property law. The state 
or other actors should not be allowed to impede it without 
the most extraordinary cause, because such an intervention 
should be viewed as less like a civil forfeiture or a contract 
negotiation tactic and more like unwanted brain surgery. 
As our world becomes a converged physical/computational 
world, our bodies must be allowed to do the same. 

The carried network demarc 

Of course, as always in discussions of the rights of in¬ 
dividuals in societies, the problem of rights limits, over¬ 
laps, and conflicts must be addressed. Especially in this 
case, where we are proposing a high level of individual 
protection, there must be limits—and importantly, “natu¬ 
ral” or obvious limits—to the extension of that protection. 
The carried network demarc proposed in this paper’s ab¬ 
stract is an attempt to make room for, but set natural lim¬ 
its on, our machines to be considered part of our bod¬ 
ies. We read it informally as “you are what you carry” 
(or #OurMachinesOurBodies) and argue it represents 
a plausible “sweet spot” along a spectrum of viewpoints. 

For example, a narrower approach could draw the “body” 
boundary at your skin, or some close approximation to it. 
Such a view would allow an implanted pacemaker to be 
“you,” but not a cellphone. An even more restrictive view 
would hold that no manufactured object can be “you” re¬ 
gardless of purpose or location, not even a pacemaker or 
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bone screw. At the other end, a more expansive alternative 
would rope in all your property, from your car to your vaca¬ 
tion homes to that squash racquet you’ve forgotten you own. 

We argue that “you are what you carry” is a better com¬ 
promise than those alternatives. Although in the future there 
may well be myriads of devices literally under our skin, 
monitoring or maintaining our health, it would seem at least 
inelegant to require we implant or otherwise ingest our sen¬ 
sorimotor interfaces to the computational world, just to earn 
them equivalent protection. On the other hand, allowing 
someone to claim arbitrary property as “self”, even when 
they do not interact with it and are unaware of its status, 
strains the key notion of utility for ongoing processing that 
is intended to underlie the notion of the extended body. 

One final alternative for this brief note: Why not use an 
actual network demarc as the body’s demarc in computa¬ 
tional space? In modern telephony, a Network Interface De¬ 
vice (‘NID’) forms the demarcation point between private 
and public utility portions of the network. With one pair of 
wires running into the house and another pair running up the 
telephone pole, the NID is a clean and well-understood solu¬ 
tion to dividing network rights and responsibilities. Unfortu¬ 
nately, the NID is a clean solution only if all transferred data 
actually moves through the device—but in the converged 
physical/computational world, data moves not just by wired 
and wireless networks, but also video cameras and all man¬ 
ner of environmental sensors public and private. There sim¬ 
ply is no clean chokepoint through which all data transfers 
will flow. The carried network demarc recognizes that some 
basic expectation of a boundary is required nonetheless. 

Related work 

Questions of self and technology cut across human endeav¬ 
ors; here we touch briefly on technology itself, philosophy, 
and law. Mann (1997) pioneered advances in wearable com¬ 
puting and augmented reality (Azuma, 1997, is an early sur¬ 
vey); the carried network demarc stands to regularize and 
strengthen protections for such wearable machinery. 

In the other direction, the “Internet of Things” (Al-Fuqaha 
et al., 2015) exemplifies the accelerating technological con¬ 
vergence of our physical and computational environments— 
as does the growth of automated surveillance (Lyon, 1994). 
Under the carried network demarc, the individual is free 
to deploy a “computational skin” made of living technol¬ 
ogy (Bedau et al., 2013)—to interact with, but also to insu¬ 
late the individual from, potentially massive environmental 
computing powers. And, crucially, manufacturers of such 
living technology cannot be faulted for striving ceaselessly 
to make such machines loyal only to their individual. 

From a more philosophical perspective, Froese (2014) of¬ 
fers a recent exploration focused, like the current proposal, 
on technology placed in or near the physical body—and con¬ 
jectures, as do I, that living technology stands to offer a pos¬ 
itive benefit-risk balance. 


And finally, legal aspects will be paramount. To this 
non-lawyer computer scientist, following Lessig (2009), the 
United States Constitution looks like legacy software for 
a distributed operating system—itself forked from a much 
older codebase dating to the massively refactored Justinian 
Code (Blume, 2009), released in A.D. 534. And as usual 
in complex software, there’s often more than one way to 
implement things. In a recent controversy over cellphone 
encryption, for example, several authors (Hart and Vance, 
2016, e.g.) offer attacks and defenses framed by Fourth 
Amendment prohibitions against unreasonable search and 
seizure. It will take a shift in thinking, but the carried net¬ 
work demarc will surround your future cellphone with a 
Fifth Amendment defense against self-incrimination. 

Call to action 

As technological society advances, exactly where to draw 
the line between self and non-self is never precise. But to en¬ 
hance human potential and protect individual liberty, it must 
be possible to include a significant amount of manufactured 
computing and communication machinery under protections 
as strong as those accorded to our bodies and our minds. 
Though cellphones have served here as an example, today 
they are far too brittle and untrustworthy for life inside the 
carried network demarc. We can do fundamentally better. 

The purpose of this paper is to seek complementary exper¬ 
tise and to open discussions on how to ensure the future tech¬ 
nological world makes adequate room for us as individuals, 
citizens, and humans. The goal is to guide the coming phys¬ 
ical/computational convergence into the powerful and em¬ 
powering mechanism for human liberty, development, and 
knowledge that it can—but is far from certain to—become. 
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Abstract 

Some artificial chemistries model the synthesis, evolution and 
complexity of digital life consisting of sequences of computer 
operations (opcodes) that are driven by point mutations to 
compete for memory and CPU time. One of us previously built 
Amoeba, a computer world inspired by Tierra and designed to 
study the emergence of self-replicating sequences of opcodes 
from a prebiotic world initially populated by randomly selected 
opcodes. Eventually an “ancestral opcode sequence” would 
emerge. The current version of Amoeba uses a computationally 
universal opcode basis set and the same addressing as Tierra. It 
was previously thought such changes would preclude the 
emergence of self-replicators. Instead, these modifications 
radically affect the emergence of self-replicators from the 
primordial soup; Amoeba exhibits a self-organization phase, 
after which self-replicators emerge. First, the opcode basis set 
becomes biased. Second, short opcode “building blocks” are 
propagated throughout memory space. When sufficiently dense, 
these prebiotic sequences combine to form self-replicators. 
Self-organization is quantified by measuring the time evolution 
of n-opcode sequence frequencies, the size distribution of 
sequences, and the mutual information of opcode pairs. 

Introduction 

Artificial computer worlds have been used to study such 
diverse topics as: artificial chemistry architectures (Suzuki, 
2011; McMullin, 2012), the synthesis of life (Ray, 1992), 
emergence of life (Pargellis, 1996a), modeling life (Adami, 
1995), biological complexity (Adami, 2000). 

There has been considerable debate as to how self¬ 
replicators can emerge from a primordial “soup” of initially 
random computer operations (“opcodes”). One hypothesis is 
that replication may require two or more cooperating entities 
(Eigen, 1971; Tangen, 2010). 

Amoeba is an artificial chemistry specifically designed to 
study the process of self-organization in a prebiotic world that 
eventually leads to the emergence of self-replicators. 
Amoeba’s memory space is initially loaded with opcodes 
randomly selected from a set of 25 possible opcodes. The 
Amoeba programs compete for memory space and CPU time 
and evolve through point mutations (Pargellis, 2003). 

The original version (“Amoeba-I”) used a limited set of 16 
possible opcodes and a memory topology similar to that in 
Avida (Adami, 1998) where virtual CPUs operated on short 
sequences of opcodes situated on a 2D “interaction grid” 
(Pargellis, 1996b). Complements of the opcodes themselves 
were the addresses so it was impossible to move to arbitrary 


positions in memory. Amoeba-I could not simulate an infinite 
Turing tape as there were no stacks assigned to the CPUs 
(Adami, 1998). 

Amoeba-II added two stacks for each CPU and expanded 
the opcode set (Pargellis, 2001). While these opcodes formed 
a computationally universal set, the addressing used 
opcode::address pairs, making it difficult to navigate 
throughout memory. 

Amoeba-III used a new topology for opcode memory 
space; the 2D interaction grid was replaced by 500 parallel 
“mini Turing tapes” or bands, each consisting of thousands of 
opcodes (Pargellis, 2003). Amoeba-III still used 
opcode::address pairing. 

Although some biasing of the original opcode basis set was 
observed (Amoeba-II and -III), there was no additional self¬ 
organization of opcode sequences into “building blocks.” 
Instead, a randomly generated sequence of opcodes, capable 
of self-replication, would “spontaneously emerge”. 

We report that a modified version of Amoeba (“Amoeba- 
IV”), with addressing that freely accesses memory, not only 
exhibits emergence, but does so using a far richer pathway. 

Description of the Amoeba-IV System 

The current version of the Amoeba system (“Amoeba-IV”) 
uses the same toroidal 2D memory space used in Amoeba-III 
(Pargellis, 2003). The opcode basis set is similar but there are 
some significant differences that have radically changed the 
self-ordering of Amoeba’s memory space, the emergence of 
ancestral replicators, and the diversity of those replicators. 

There are several major improvements. NOPs are used for 
addressing as was done with the Tierra/Avida systems. The 
self-exam process for calculating a cell’s size is more 
complex. Eight opcodes use the Avida methodology of a 
default operation modified by means of a following NOP. 

The most significant change is the use of NOP-addressing. 
It was originally thought that using NOPs would require a 
prohibitively long time for emergence from a soup of random 
opcodes since a replicator would require at least five NOPs in 
addition to a minimum of seven opcodes required in Amoeba- 

III for primitive protobiotics. However, we find that Amoeba- 

IV exhibits emergence by self-organizing its opcodes through 
several phases. First, the opcode set coalesces into a reduced 
set. Second, primordial “building-blocks,” consisting of short 
opcode-sequences, are propagated throughout the memory 
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space. Finally, replicators emerge if the building-block density 
is sufficiently high. 

This propagation of primordial building-blocks leading to 
emergence is the most significant observation of our current 
Amoeba research. 

Opcode Basis Set 

The set of opcodes used in Amoeba-IV is listed in Table 1. 


No. 

Opcode 

Abbrev. 

Description 

1 

NOP1 

Address label. Complement is “2”. 

2 

NOP2 

Address label. Complement is “1”. 

3 

MALL 

Allocate a virtual CPU for child. 

4 

COPY 

Copy one opcode from parent to child. 

5 

DIVD 

Initiate child CPU. 

6 

IFAG 

If AX > BX, skip next opcode. 

7 

IFAL 

If AX < BX, skip next opcode. 

8 

JMPB 

Jump IP back to NOP complement. 

9 

JMPF 

Jump IP forward to NOP complement. 

10 

CALL 

CALL address (similar to JMPF). 

11 

RETN 

Return to opcode after CALL. 

12 

ADRB 

Search back for address, put in EX. 

13 

ADRF 

Search forward for address, put in EX. 

14 

SWEF 

Switch EX value with FX value. 

15 

TOGS 

Toggle active stack (Stk-A and Stk-B). 

16 

PSHA 

Push AX (BX) value onto active stack. 

17 

POPA 

Pop active stack value to AX (BX). 

18 

ADDA 

AX = AX + BX (BX = AX + BX) 

19 

SUBB 

BX = BX - CX + 1 (AX = AX - CX) 

20 

BEQA 

BX = AX (AX = BX) 

21 

AEQZ 

AX (BX) = 0 

22 

INCA 

Increment AX (BX) register by one. 

23 

DECA 

Decrement AX (BX) register by one. 

24 

INCD 

Increment DX register by one. 

25 

DECD 

Decrement DX register by one. 


Table 1: Description of opcodes used in Amoeba. 

Differences in the Opcode Definitions. Ideally, an opcode 
basis set is computationally universal, enabling a system to 
develop algorithms of arbitrary complexity. But, the choice of 
basis set is not the only criterion for a Turing machine. 
Another requirement is the ability to move to any location in 
memory. Maley showed that the Tierra system could simulate 
a Turing machine, although somewhat inefficiently (Maley, 
1994). Amoeba-IV addresses this issue by means of the 
Tierra-like NOP-addressing. 

The JMPB, JMPF and CALL opcodes move the pointer to 
the address complement of the following NOP(s). In the 
absence of such a NOP, the IP will jump to the address stored 
in the EX address register if an opcode such as ADRB or 
ADRF had previously loaded the EX register. 

Several opcodes (PSHA, POP A, ADDA, SUBB, BEQA, 
AEQZ, INCA, DECA) invoke the Avida methodology. A 
default operation is modified if the following opcode is a 
NOP. The alternate action is shown in parentheses in Table 1. 


Memory Space and Virtual CPUs 

Amoeba-IV uses a 2D memory space with periodic 
boundary conditions, organized into 500 “Tierra-like” bands, 
each with 2399 opcodes, for a total of ~1.2xl0 6 opcodes. The 
basis set has 25 unique opcodes. A cell’s sequence is confined 
to a particular band. However, cells can allocate memory for 
their children in adjacent bands and at other start positions 
within those bands. We chose 2399 opcodes per band because 
2399 is a prime number. This prevents cells from being able 
to generate an integral number of children along a band, 
creating a “barrier” to altering cell sizes in future generations. 

There are 2000 virtual CPUs, allocated in two ways. First, 
at the start of a new generation, 5% of the CPUs are assigned 
to sequences of randomly generated opcodes. Second, a cell 
(opcode sequence) allocates the next CPU in the queue (using 
the MALL opcode) prior to copying opcodes to its child. A 
new generation starts when all CPUs have been allocated. 

Each CPU has four numerical registers, two address 
registers, two stacks, and an instruction pointer (IP) that 
operates on opcodes. Additional parameters include the cell’s 
size and its IP location (band and position within the band). 
CPUs are accessed sequentially. Each CPU is given a slice of 
CPU time that is proportional to its cell’s size. Slices range 
from a minimum of 6 to a maximum of 100 operations. 

Figure 1 is a snapshot of part of Amoeba’s memory space 
(color-coded opcodes) along with a schematic of a virtual 
CPU. An opcode sequence in one of the bands is expanded. 



Registers 



Figure 1: A small portion of the Amoeba memory space is shown at 
the top. Opcodes are color-coded; the color key is at the bottom. 

In Figure 1, IPs move from left to right along a band unless 
jumped to a NOP-address along the band by means of a 
JMPB, JMPF, CALL, or RETN opcode. 
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Evolution and Mutations 

Replicators evolve in three ways: opcode mutations; randomly 
generated sequences; and cells overwriting each other’s code. 
Opcodes are mutated at two different times. Each time a cell 
copies an opcode to its child (COPY opcode) the probability 
that opcode is substituted by another is 0.005. Each time a cell 
initiates its child (DIVD opcode), the child’s opcode sequence 
can undergo one of three types of mutations, for a total rate of 
0.10. Empirically chosen mutation rates are insertion (0.02), 
deletion (0.02), and substitution (0.06). Divide mutation rates 
above 0.20 randomize opcodes faster than building-blocks 
critical for replications are propagated while rates below 0.05 
overly bias the opcode basis set; NOPs dominate and some 
opcodes critical for replication become too infrequent. 

At the start of each new generation, 100 sequences, each 
consisting of one, two, or three randomly selected opcodes, 
are randomly distributed throughout memory space. The 
probability of inserting a random sequence is peaked at the 
middle of the 500 bands. The middle bands tend to be 
“melted”, but the edge bands have no random sequences. 

There is no write-protection in Amoeba. The MALL 
opcode only defines the start position for a child’s opcodes 
and assigns it a virtual CPU. 

Anatomy of a Typical Self-replicator 

Figure 2 shows the anatomy of a typical self-replicator. 


Exam (ADRB, NOP1, ADRF, NOP2, SUBB), Copy Loop 
(NOP1, COPY, INCA, IFAG, JMPB), Biological where a 
virtual CPU is allocated and then initiated (MALL, DIVD), 
and Reset registers (INCD, POP A, JMPB). Note that the 
genes overlap and/or can be split into multiple pieces. 

The main feature of the Amoeba-IV system is the self¬ 
organization of the initially random distribution of opcodes in 
memory. 

Self-organization Leads to Emergence 

The development and evolution of an Amoeba system from 
the initial random distribution of opcodes to the emergence of 
“ancestral self-replicators” occurs during three main stages: 
prebiotic, protobiotic, biotic (Pargellis, 1996a). The prebiotic 
phase coalesces the original set (alphabet) of 25 opcodes into 
a reduced, biased set of about 15 to 20 opcodes. This reduced 
set continues to self-organize, generating short sequences of 
n-opcodes (“n-ops”) that propagate critical building blocks 
required for replication. An inefficient proto-replicator 
emerges, usually within a million generations, when the 
building-block density is sufficiently high. Mutations drive 
the proto-replicators to evolve into robust replicators that 
eliminate unneeded opcodes and unroll the copy-loop 
(multiple {COPY, INCA} sequences per loop). 

Figure 3 shows the distribution of (binned) emergence 
times for a set of 25 runs out of a total of 49 runs. 
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Figure 2: Anatomy of a replicator. Run: 06-08-6140. Gen: 3.499M. 


The opcode color-coding is as in Figure 1. This replicator 
has four “replicator genes” (shown on the left-hand side): Self- 
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Figure 3: Distribution of self-replicator emergence times. 


The opcode basis set always biases into a consolidated set 
of about 20 opcodes within the first 50,000 generations. 
Nearly half of the emergences occur within the first million 
generations. One cause for the steady drop in emergence 
probability with time is the continued biasing of the opcode 
basis set. Significantly, after ten million generations more 
than 25% of the opcodes are either NOP1 or NOP2. This 
reduces the frequencies of other opcodes critical for 
replication, including conditional opcodes (IFAL and IFAG) 
and branching opcodes (JMPB and JMPF). 
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Self-organization 

The propagation of n-ops throughout memory can be 
visualized in the (partial) screen-shots shown in Figure 4. (A) 
Initial random opcode distribution, (B) prebiotic self¬ 
organization, (C) post-emergence dominated by self¬ 
replicators. 



Figure 4: Self-organizing memory space in Amoeba. (A) Initial 
random opcodes; (B) Prebiotic propagation of short opcode 
sequences; (C) Post-emergence generation of self-replicators. Color¬ 
coding shown in key at bottom. Run: 07-08-2015 54654. 


The initially random opcode distribution of Figure 4A 
becomes partially ordered during the self-organization phase 
of Figure 4B. Some opcodes are replicated sequentially, 
shown by the horizontal lines of a single color. A series of 
snippets are visible in the left-hand third of Figure 4B. Figure 
4C shows that after emergence, the memory map consists of 
thousands of replicators across bands and within bands. 

We can quantify the memory-map organization of Figure 4 
by calculating the two-position opcode correlations within a 
band. Opcodes are uncorrelated at the beginning of a new run. 
Once self-replicators emerge, opcodes are correlated over 
multiples of the replicator lengths. We have calculated the 
correlations within bands over windows 100 opcodes wide. 
We do not sow those results here because the correlations do 
not add significantly to what we already learn from visual 
inspection of the memory space. 

Size Distribution of Opcode Sequences. The self¬ 
organization and increasing frequencies for selected n-ops 
leads to a population of children with steadily growing sizes. 

Figure 5 shows the growth in the sizes of children. Initially, 
the sizes are mostly one to three opcodes as this is the size 
range for the (100) randomly generated sequences placed 
throughout the Amoeba world at the beginning of each new 
generation. Longer opcode sequences become more prevalent, 
eventually leading to the emergence of a proto-replicator. It 


should be noted that some Amoeba runs have generated large 
proto-replicators with sizes of several hundred opcodes. 
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Figure 5: Evolution of sizes (number of opcodes) for children. 
Protobiotics emerge at about 70,000 generations and steadily evolve 
into a distribution of robust replicators. Run: 07-12-2015_31648. 


Frequency Growth of Critical n-opcode sequences. A 

fundamental difference between Amoeba-IV and earlier 
versions is that the “ancestral” protobiotic does not suddenly 
emerge through some fortuitous combination of opcodes. The 
inclusion of NOPs for addressing in the Amoeba-IV state 
machine makes the probability of such an ancestor’s 
spontaneous emergence highly improbable. To date, the 
smallest self-replicators contain about 15 opcodes (14 if the 
replicator “guesses” its size, e.g. does not have the self-exam 
gene). This results in 25 14 = 10 19 6 combinations of which a 
tiny (not easy to estimate) fraction are self-replicators. 

Surprisingly, rather than losing self-replicator emergence, 
the Amoeba memory space self-organizes over a period of 
several hundred-thousand generations by propagating opcode 
sequences of ever growing length and complexity. Eventually, 
in about half the runs, a proto-replicator ancestor emerges that 
quickly evolves into a population of robust replicators. 

Case Example: Self-organization to Emergence 

In this section, we analyze one example for the emergence 
from an initial primordial “soup” of random opcodes. We first 
present the anatomy of the emerged replicator, followed by 
data on the self-organization that leads to the propagation of 
building blocks for the “replicator genes.” 

The most complicated replicator gene is the copy-loop 
because it includes machinery for copying opcodes from the 
parent to its child, a branch opcode to repeat the loop, and a 
conditional check that breaks out of the loop. This means a 
typical copy-loop consists of some version of {NOP, COPY, 
INCA, IF AG, JMPB}. As Figure 2 above shows, this can be 
complicated in cases where parts of other genes, such as the 
Self-Exam gene, are embedded in the Copy-Loop gene. 

In this section, we choose a somewhat unusual case using a 
CALL-RETN copy-loop rather than the more common loop 
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ending with a JMPB opcode. We chose the CALL-RETN case 
because there are no extraneous opcodes (parts of other 
replicator genes) embedded in the copy-loop, despite the fact 
that the CALL-RETN loop has six opcodes instead of five. 
(Actually, the example shown here unrolled the {COPY, 
INCA} combination very quickly. We will show the building- 
block development that led to this.) 

Anatomy for a “CALL-RETN” Replicator. The anatomy of 
a “CALL-RETN” replicator is shown in Figure 6. There are 
14 unique opcodes in this replicator that are useful for 
replication. Irrelevant opcodes (introns) have been neglected 
for brevity. The CALL-RETN replication method is rare 
because it requires two opcodes (CALL and RETN in that 
order) for closing the copy-loop. Most replicators just use one 
opcode (JMPB) to close their copy and reset loops. 


Finds start"] 



Figure 6: Anatomy of a replicator that uses the CALL/RETN 
combination for both the (extended) COPY loop and IP return. Run: 
11-10-2015_25271. Gen: 321,982. 


The CALL opcode is handled similarly to the JMPF 
opcode; CALL first looks for one or more NOPs immediately 
following. In the absence of subsequent NOPs (as in this 
example), CALL will check its EX-register for any loaded 
address and then jump to that address. In this case, the ADRB- 
NOP2 combination has loaded the EX-register with the 
complementary address, “1” (NOP1). The RETN is used both 
for the COPY loop and resetting the IP after the DIVD. 
(Amoeba ignores the DIVD operation unless the parent had 
first allocated a CPU for its child by means of the MALL 
opcode). The ADRB is replaced with ADRF in only 2000 
generations. This is frequently seen in Amoeba as a parent 
automatically puts the beginning of its child sequence in the 
child’s CX register. The only advantage of retaining the 


ADRB opcode is to capture “rogue IPs”, enabling the host to 
commandeer the virtual CPUs associated with other cells in 
the same band. 

Growing Single-Opcode Frequencies. Initially, all 25 
possible opcodes are equally distributed and the frequency 
(fraction, f(rrij), of all 1.2 million opcodes) is the same;/(0 7 ) = 
0.040 for all Oj. However, the frequencies for some opcodes 
useful to propagating sequences of opcodes preferentially 
grow at the expense of other opcodes. 

Figure 7 shows the increase in frequency over time (left- 
hand scale) for selected single opcodes (l-ops). Emergence 
occurred at about 292,000 generations (labeled, vertical line). 

On the right-hand scale in Figure 7, we plot the 
“monomeric opcode entropy”, S(0), as a quick check into the 
status of self-organization of single opcodes and given by, 



where D = 25 is the size of the alphabet (number of unique 
opcodes in the basis set), M = 1.2 x 10 6 is the size of the 
memory space (total number of opcodes), rttj is the number of 
occurrences (counts) for the / h opcode, given by the symbol, 
Oj, and In is the natural log. An estimate of the effective size 
of the opcode basis set is E(O) = exp(S). For the initial, 
equally distributed opcodes, S(O) = 3.218 and E(O) = 25. By 
500,000 generations, the entropy drops to S(O) = 2.935 , 
indicating the useful part of the basis set shrinks to about 19 
opcodes. Most runs evolve a basis set of 15 to 16 opcodes. 
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Figure 7: Growth of 1-ops and commensurate decrease in entropy for 
the CALL-RETN replicator of Figure 6. Run: 11-10-2015_25271. 


Initially, 1-ops conducive to replication grow in frequency: 
NOP1, NOP2, MALL. The NOP1 and NOP2 frequencies 
begin to increase about 70,000 generations before emergence. 
The NOPs are arguments for the CALL and ADRB opcodes. 

The ADRB opcode is not required in order for a cell to 
calculate its size and load that value into its BX-register. 
Amoeba automatically loads into a child’s beginning location 
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in the band into the CX-register when a parent first allocates 
memory (using the MALL opcode) for its child. However, 
ADRB is useful for resetting the CX registers of rogue IPs, 
essentially hijacking other virtual CPUs for the host. Once the 
replicator population becomes dominant, the ADRB is no 
longer useful. In this case example, the ADRB opcode 
mutated into ADRF which also loaded the EX-register with 
the NOP1 address (used by CALL). Consequently, f(ADRF) 
doubles and f(ADRB) drops. 

After emergence, COPY and INCA become more prevalent 
when the copy-loop is “unrolled” and the combination 
{COPY, INCA} is repeated. 

Growing Multi-Opcode Frequencies. The co-occurrence of 
two opcodes, and therefore the degree of opcode self¬ 
organization, can be quantified using the mutual information 
(. MI) measure (Manning, 2000), 


MI= X P(P„Oj) In 

{Oi,Oj} 


PiOnOj) 

p(QMOj) 


(Eq. 2) 


where the sum is over all 25 2 = 625 possible opcode pairs for 
our basis set of 25 opcodes. The argument in the sum is the 
pointwise mutual information, MI(O h Oj), which is the log of 
the odds-ratio of opcode pairs and is zero in the absence of 
any correlation. 

We plot MI versus time with the dashed blue line in Figure 
8 (right-hand scale). The MI shows clearly that self¬ 
organization has occurred. Critically, what we previously 
noted as the time of emergence coincides with the maximum 
increase in the mutual information. Scientifically, this is a key 
finding. We demonstrate the degree of ordering for the joint 
probability versus the uncorrelated single probabilities for a 
select set of opcode pairs (2-ops) by plotting the time 
evolution of the “odds-ratio”, the argument of the natural 
logarithm in (Eq. 2), in Figure 8. We used the CMU toolkit for 
counting n-ops (Clarkson, 1997). 
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Figure 8: MI (RH) and Odds-ratios (LH) for 2-ops that lead to the 
COPY-RETN building block for the replicator of Figure 6. 


A key observation is that 2-ops critical to development of 
the CALL-RETN loop, {CALL, COPY, INCA, RETN}, grow 
in abundance at least 50,000 generations before emergence 
during the self-organization period. The ADRB opcode gets 
replaced by the ADRF opcode once it is no longer useful after 
emergence and the {RETN, NOP1} drops after emergence 
because introns are inserted between the RETN and NOP1. 

Development of the Copy-Loop Building Block. The copy- 
loop building block is noteworthy since the primordial COPY- 
RETN loop, {CALL, COPY, INCA, RETN}, is clearly a 
prebiotic building-block; it is impossible for an IP to break out 
of this loop! 

This sequence copies opcodes throughout memory, but 
without a conditional check (IFAG or IFAL) the IP can never 
break the loop and initiate a child. Nevertheless, this 
propagator is a useful building block, capable of propagating 
itself and other opcodes. An insertion mutation could have 
modified this building block prior to emergence by inserting 
an IF AG conditional check before the RETN opcode. 

Figure 9 is a schematic showing how the copy-loop is built 
from the CALL-RETN building-block. 



Figure 9: Timeline for the copy-loop building block. 


The CALL opcode calls the NOP1 address (see Figure 6). 
Usually, the CALL opcode is followed by a NOP and the call 
is to that NOP’s complement. In the absence of a subsequent 
NOP, the CALL looks into its address register, EX, to see if it 
has an address. In this run, the previous ADRB opcode loaded 
the EX-register with NOP1, the complement to NOP2. Note: 
the {ADRB, NOP2} combination was many opcodes prior to 
the CALL-RETN loop in prebiotic propagator sequences. 
There was also a subsequent NOP1 in the primordial form of 
the CALL-RETN loop. An intron preceded the NOP1 during 
pre-emergence. 

The vertical dashed blue lines show the time range over 
which each of the sequences exist. The solid blue diamonds 
are when we first saw the sequence in either a log-file or a 
memory snapshot. Times of extinction are shown by solid 
blue circles. Once a sequence occurs, it is propagated for 
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many generations until replaced by a more viable alternative. 
For example, the {CALL, COPY, INCA, RETN} sequence 
persists until the copy-loop is unrolled (generation 239,000). 


About 200,000 generations later, this proto-replicator 
evolved into a robust replicator that used ADRF and SUBB 
opcodes to properly calculate its size. 


Emergence of Ancestral Replicators 

A consequence of the openness of the Amoeba-IV system is 
the diversity of ancestral replicators that emerge from the self¬ 
organizing, initially random “soup” of opcodes. 

It is a challenge to identify the emergence of an “ancestor” 
in Amoeba-IV. Ancestors are logged in a file when a parent 
has faithfully copied itself to children for at least four 
generations. However, the early “protobiotic” replicators do 
not copy themselves with any fidelity. It typically takes 
several thousand generations before a “faithful” replicator is 
generated and logged. During this time, millions of cells have 
been initiated and some small subset of that number will 
eventually lead to an ancestor that faithfully copies its 
opcodes to its children. It is possible that an ancestor emerges 
out of a hypercycle of interacting components (Eigen, 1971) 
but this type of interaction is nontrivial to track. One can 
examine the “world snapshots” that are periodically saved, but 
each snapshot is a list of the entire Amoeba memory; the 
analysis of any interactions within that map is difficult. 

The next examples show the rich diversity of replicators 
that is one of the novel outcomes of the modified Amoeba-IV 
artificial chemistry. 

“Size-guesser” Protobiotic. An example of a class of robust 
proto-replicator that has emerged in several runs is shown in 
Figure 10. This is an inefficient replicator; it does not include 
the “self-exam” gene so it “guesses it size” and cannot 
generate a complete copy of the parent until the 3 rd child. 



Figure 10: Anatomy of a replicator that guesses its size; the self-exam 
gene is missing. Run: 06-17-2015_54809. Gen: 1.042M. 


“Conditional Ladder” Replicator. The anatomy of a 
“conditional ladder” replicator is shown in Figure 11. 
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Figure 11: Anatomy of a replicator that uses a “conditional ladder” of 
IFAL opcodes to avoid resetting registers (DIVD, POP A, SUBB). 
Run: 06-20-2015 31615. Gen: 4.315M. 


This replicator’s COPY loop is inefficient because it 
includes the entire cell’s opcode sequence. An “IFAL-ladder” 
is used to avoid prematurely resetting the cell’s registers while 
copying its code to a child. When done copying, the 
replicator’s 4 th IFAL check fails and the BX register is set by 
means of the 2 nd SUBB opcode. The IFAL ladder checks 
then “fail” and the replicator initiates its child (DIVD) and 
resets its AX register (POPA). 


Discussion and Future Research 

There are several other interesting observations in addition to 
the success in generating self-replicators due to opcode self¬ 
organization from a primordial soup consisting of Tierra-like 
opcodes and addresses. 

Many proto-replicators lack some of the replicator genes 
shown above in Figure 2. An example of a proto-replicator 
without the self-exam gene was shown in Figure 10. Other 
replicators do not retain their IP. This was observed in earlier 
versions of Amoeba where cells are members of a colony. The 
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colony members lose their IP after generating a single child. 
Their IPs then execute opcodes “belonging to” other cells 
further down their respective Tierra-band. 

We observe runs where replicators cheat and get a larger 
CPU time-slice. This issue arises because the time-slice is 
proportional to a cell’s size. Earlier versions of Amoeba 
automatically incremented the AX-register as part of the 
COPY opcode; the size of a cell was the number of opcodes 
copied. Amoeba-IV now requires the INCA opcode to 
increment the AX-register. This raises an interesting question: 
how to determine the size of a child when the DIVD opcode 
initiates it? We cannot increment the size every time a COPY 
is used since prebiotics can copy the same opcode many times 
(they lack the INCA opcode). Currently, we use the AX- 
register value to determine the size. The “cheaters” copy their 
replicator sequence, typically about 30 opcodes. Then, prior to 
the DIVD opcode, they use multiple ADDA opcodes to 
increase the AX-value to many times the cell’s size. We are 
still investigating how to prevent this parasitic behavior 
without decreasing replicator genome diversity. 

The Amoeba systems have always been dominated by 
variants of a single species after emergence. We have never 
observed two radically different species co-existing. Viruses 
occur, but are quickly eliminated when the host mutate. For 
example, the retention of the ADRB opcode, even though not 
required for a self-exam, effectively “highjacks” viral IPs and 
their associated CPUs. We believe a variation in the externally 
imposed fitness landscape may enable alternative species, as 
well as parasites and hosts, to co-exist for extended times. 
Amoeba-IV partially addresses this by varying the rate at 
which random sequences are distributed throughout the bands. 
However, replicators are still able to quickly scatter their 
children throughout all bands. There are several options to 
consider: slowing down the rate at which children are spread 
throughout the bands, imposing write-protection when a cell 
allocates memory for a child (similar to Tierra), or modifying 
the time-slice parameters for different regions in memory so 
that replicator sequences of different sizes would find some 
parts of memory inhospitable. 

Only half (25 out of 49) of the Amoeba runs exhibit 
emergence (see Figure 3). We observe that the probability of a 
replicator emerging after about one million generations is low. 
This implies that there are some self-organization processes 
that can lower the probability of emergence. One observation 
is that the NOP1 and NOP2 opcodes steadily become more 
prevalent in runs without emergence. This reduces the 
likelihood of creating building-block propagators because 
other opcodes critical to replication occur less frequently. 

The earlier Amoeba versions I and II demonstrated that the 
probability of a randomly generated opcode sequence being a 
replicator increased with size (number of opcodes). It has 
been argued that computation-universal chemistries such as 
Avida or Tierra would exhibit a probability that decreased 
with increasing size (Adami, 1998). Preliminary observations 
with Amoeba-IV indicate this may not be the case - ancestral 
replicators typically include 200 to 300 opcodes - and this 
aspect of our findings warrants further exploration. While 
ancestors in Amoeba-IV are generated from building-blocks 
that are propagated during the self-organization phase, these 
“genes” are generated multiple times in huge ungainly 


“protobiotics” with sizes ranging up to the maximum 
currently allowed (450 opcodes per cell). 

Another topic of interest is the issue of “propagators,” 
“protobiotics,” and “replicators.” Amoeba-IV has shown it is 
very difficult to identify precisely when an ancestral replicator 
“emerges.” It may be that groups of propagators form 
hypercycles that eventually lead to robust self-replicators 
(Eigen, 1971; Eigen, 1981; Eriksson, 2006). We are currently 
investigating this by logging sequences that have been 
generated at least 50 times, regardless of whether or not the 
sequences are the same as their parental sequence. 
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Abstract 

Many origin of life theories argue that molecular self¬ 
organization explains the spontaneous emergence of struc¬ 
tural and dynamical constraints. However, the preservation of 
these constraints over time is not well-explained because of 
the self-undermining and self-limiting nature of these same 
processes. A process called autogenesis has been proposed 
in which a synergetic coupling between self-organized pro¬ 
cesses preserves the constraints thereby accumulated. This 
paper presents a computer simulation of this process (the Au¬ 
togenic Automaton) and compares its behavior to the same 
self-organizing processes when uncoupled. We demonstrate 
that this coupling produces a second-order constraint that can 
both resist dissipation and become replicated in new sub¬ 
strates over time. 

Introduction 

Life’s ability to resist degradation and persist in hostile en¬ 
vironments is both ubiquitous and astonishing. Generation 
of structure, preservation by repair, and trait persistence 
through reproduction are perpetually organized in a contin¬ 
uous struggle against the destabilizing mechanisms of the 
second law of thermodynamics. Despite their often piv¬ 
otal role in explaining the emergence of life, self-organizing 
processes are limited in their capacity to maintain struc¬ 
ture (Prigogine and Stengers, 1984; King, 1982). Autoge¬ 
nesis is a recently proposed theory that suggests that, be¬ 
yond mere self-organization, a synergetic coupling between 
self-organizing processes is a minimal requirement for life 
(Deacon, 2012). Through this higher-order linkage, the pro¬ 
cesses that generate structure may persistently recreate a 
capacity for self-creation, leading to robustness and a po¬ 
tential capacity for long-term sustenance and natural selec¬ 
tion. An instance of such an autogenic process, a proto-life 
model called autogen , shows how two self-organizing pro¬ 
cesses - reciprocal catalysis and self-assembly - maintain 
each other’s boundary conditions and thereby mutually in¬ 
crease their probability of persistence over time (Deacon, 
2006). 

Currently, the autogen model is a theoretical proposition 
that remains to be validated experimentally. This paper de¬ 
scribes a series of simulation experiments that investigate 


the self-organizing and self-preserving properties of the au¬ 
togen model. A simplified particle system simulation called 
the Autogenic Automaton models the synergetic linkage of 
self-organizing processes that leads to the emergence of au¬ 
togens. 

Second-Order Self-Organization 

The nonlinear amplification that is typical for self¬ 
organization tends to push the thermodynamic conditions for 
further propagation toward the unfavorable (Haken, 2006). 
This may occur up to a point where the system is no longer 
far-from-equilibrium and the local thermodynamic entropy 
increase comes to a halt. For example, in a reciprocally cat¬ 
alytic set reaction rates may increase exponentially as more 
and more catalysts are produced, up until the point when not 
enough reactants are available for further propagation and 
the self-organizing process ends (Plasson et al., 2011). Al¬ 
ternatively, self-organization may break down due to unfa¬ 
vorable changes in external conditions. Considering the uni¬ 
versal presence of self-organization in living systems, how 
can it be possible that organization persists long enough for 
complex organisms to come about? 

When the product of an autocatalytic reaction enables 
a second autocatalytic reaction, which produces a reactant 
that enables the first (or a third, etc., as long as the causal 
chain is eventually closed), a so-called hypercycle emerges 
(Eigen and Schuster, 1979). Hypercycles represent one pos¬ 
sible way in which self-organizing processes, autocatalytic 
cycles in this case, may be linked together in a dynami¬ 
cal process hierarchy. However, with respect to preventing 
dissipation this kind of second-order self-organization does 
not provide a sufficient solution: every autocatalytic cycle 
that the hypercycle consists of represents a potential weak¬ 
est link, which may cause the fragile hypercycle to break 
down entirely when the reactants or energy necessary for 
the cycle are no longer available. Conversely, autogene¬ 
sis suggests another type of second-order self-organization 
where two or more self-organizing processes not only pro¬ 
mote each other, but also act as a supportive environment 
if one of them breaks down in such a way that their self- 
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undermining tendencies are reciprocally counteracted. 

Autogenesis 

The formation of crystals through self-assembly is a self¬ 
organizing process as the probability of particle detachment 
decreases with the number of adjacent particles that keep it 
in place. This asymmetric process causes particles to cluster 
together, creating a spatial difference in particle locations. 
This difference, maintained by the probabilities of attach¬ 
ment and detachment, may be viewed as a constraint on the 
spatial distribution of particles. More generally, this reduc¬ 
tion of variety of macroscopic states can be understood as a 
constraint producing process. 

In a reciprocally catalytic system, each reaction initially 
leads to an increased probability for another reaction to take 
place as more and more catalysts are created. Exponential 
growth ensues until the reactants are depleted. Reciprocal 
catalysis leads to exponential increase of reactions that is 
limited solely by the number of available reactants. 



Figure 1: Illustration of autogen formation (left) and the re¬ 
actions involved (right). Autogens constitute a dynamical 
linkage between self-assembly and reciprocal catalysis. In 
this model system, self-assembly is a self-organizing pro¬ 
cess where G particles attach to one another, forming G n 
crystals of size n. These crystals may break up due to de¬ 
tachment. G particles are generated by a reciprocally cat¬ 
alytic set of six different particle types (A to F). In turn, 
crystals may contain C and F particles, isolating these cata¬ 
lysts from potential reactants and thereby ensuring a poten¬ 
tial for G-particle production over time. 

A boundary or container would prevent the exhaustion of 
reactants by shielding them from the environment, thereby 
preserving a chemical potential for further dissipation (Mat- 
urana and Varela, 1980; Rosen, 1991). Such a container 
may itself be formed by a self-organizing process, e.g. crys¬ 
tal growth through self-assembly (Fellermann et al., 2007). 
The autogen model (fig. 1) goes further in suggesting that 
the form and function of a self-assembled container is dy¬ 
namically linked to the catalytic process as it prevents the 
reactants from being depleted; similarly, it explains how the 
catalytic process dynamically shapes the form and function 
of the crystals as it affects the process of self-assembly. 


Following the type of second-order self-organization de¬ 
scribed above as autogenesis, constraint preservation is en¬ 
abled by a juxtaposition of constraint producing processes, 
such that they actively support each other’s persistence. 
Whereas self-promoting self-organizing processes such as 
hypercycles tend toward self-undermining and ultimately a 
breakdown of the causal cycle, this reciprocally counteract¬ 
ing juxtaposition actively prevents self-undermining from 
taking place. As the autogen is able to do work on its own 
conditions for sustenance, it grows independent from the 
conditions of its environment and becomes more dependent 
on its internalized constraint. When the autogen is damaged, 
it likely begins to repair itself; under some conditions the 
probability of growth and sustainment may become higher 
than that of breakdown. 

The relative stability of these structural synergies allows 
for a simple type of natural selection to occur, as some 
will be better suited to prevailing conditions than others and 
therefore have a better chance of sustaining themselves. This 
eventually leads to a higher-level reduction of variety, as un¬ 
successful variety-reducing synergies are removed. 

The Autogenic Automaton 

The processes that generate and preserve constraint and 
which are necessary for autogen formation are simulated in 
a program called the Autogenic Automaton. This simulation 
does not provide a physically accurate model, but is merely 
aimed at demonstrating the viability of the proposed con¬ 
straint hierarchy (McMullin and Varela, 1997; Varela et al., 
1974). A two-dimensional tile grid is used as a discrete 
model of a closed reaction-diffusion system (fig. 2). Six par¬ 
ticle types are used for modeling reciprocal catalysis while a 
seventh type models the formation of crystals through self- 
assembly, following figure 1. Particle movement and the 
reaction rules that govern particle attachment, detachment, 
and the creation and removal of particles are all computed 
locally per tile. As such, the system resembles a cellular 
automaton (Bilotta and Pantano, 2005) where multiple par¬ 
ticles may occupy one tile. Particle movement and diffusion 
caused by particle-to-particle collisions is approximated by 
random movement to a horizontally or vertically neighbor¬ 
ing tile with a probability of 0.1 at every time step for each 
particle. With respect to movement, crystals are treated as 
single particles together with any particles they encapsulate. 

The simulation is initialized by distributing predefined 
quantities of particle types randomly on the grid. Next, par¬ 
ticle positions and reactions are computed per tile. Model¬ 
ing only localized reactions ensures that only small subsets 
of the total number of particles interact at each time step, 
reducing the computational complexity of the simulation. 
Furthermore, only the aspects of self-organization that are 
necessary for demonstrating the viability of autogenesis are 
modeled. Physical properties such as kinetic energy, heat 
dissipation, and crystal geometry are not simulated. Due 
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Figure 2: A closed continuous particle system (left) is mod¬ 
eled approximately in the Autogenic Automaton as a non- 
toroidal discrete grid of 10 x 10 tiles (right). 


to this absence of heat and friction, the entropy potential 
necessary for far-from-equilibrium systems to become self¬ 
organizing is defined with respect to chemical equilibrium 
rather than thermodynamic equilibrium. The simulation is 
initialized with chemical non-equilibrium conditions, so be¬ 
fore any crystallization has occurred or catalytic reactions 
have taken place. 

Quantifying Constraint 

Through the course of a simulation run, the system moves 
through various macroscopic states caused by processes that 
generate, preserve and select constraint. Constraint can be 
expressed by means of the information entropy over the spa¬ 
tial probability distribution of particles or reactions in the 
tile grid (Kauffman, 1993; Harder and Polani, 2013; Polani, 
2008). Notable characteristics of the particle system surface 
by observing the (in-)homogeneity of locations that corre¬ 
spond to these characteristics. So, the quantification of sta¬ 
tistical entropy described below yields an indirect observa¬ 
tion of the underlying processes. 

Given a set of probabilities pi for i = 1,..., n t ii es with 
0 < Pi < 1 and ^ Ji Pi = 1, the information entropy is 
defined as (Shannon, 1948; Cover and Thomas, 1991): 


n tiles 

H = -^2 Pi log 2 Pi ■ 


When we consider an event (i.e. particle occurrence or reac¬ 
tion) X , we substitute 


For ease of interpretation we often consider the so-called 
normalized information entropy (Jost, 2006) 


H{X) = 


1 

log 2 Stiles 


n tiles 

E 


i =1 



1*1 ’ 


(i) 


which normalizes the standard information entropy by its 
maximum value such that always 0 < H(X) < 1. For 
a completely homogeneous distribution of events over tiles 
we get H(X) = 1, whereas H(X) = 0 when all events X 
are concentrated at a single tile. 

Where it is necessary to measure the spatial difference 
between two event types, X and Y, the Kullback-Leibler 
divergence of their distributions over the grid is used (Kull- 
back and Leibler, 1951): 


n tiles 

Dkl{P\\Q)= y>ilog 2 ^. 


\X-\ \Y-\ 

Substituting pi and qi with and respectively, would 
yield infinite divergence for a distribution with a tile i such 
that \Xi\ > 0 and \Yi\ = 0. To resolve this problem, a 
smoothing function is used (Bigi, 2003), where 


Qi = 


a ^ for |Yj| > 0 
e for \Yi\ = 0 


( 2 ) 


with e = 10 5 and normalization coefficient a chosen such 

that the probabilities sum to 1. pi is substituted similarly 

\ x .\ 

with . For ease of exposition, we will omit this smooth¬ 
ing in subsequent formulas. 


Constraint Generation 

The Autogenic Automaton is used here to simulate the gen¬ 
eration of constraint in the formation of crystals through 
self-assembly, and in the locally nonlinear reactions taking 
place in a reciprocally catalytic set. In later sections, these 
two processes will be combined to simulate autogenesis. 

Self-Assembly 

Self-assembly is modeled in a simplified way, by defining 
attachment and detachment reactions between G particles 
and crystals G n \ 


| A'| ' 

with \Xi\ the number of events at tile i and \X\ = ^2 i \Xi\ 
the total number of events, to obtain 


n tile s | | 


{*! 
1*1 ' 


G n + G ^ G n+1 , (3) 

with n > 1 and G 1 = G. At every time step, when a G 
particle is located within the same tile as either another G 
particle or crystal, the probability of attachment is given by 
reaction parameter : 

P+ = 7 + G [0,1] . 


70 





Once formed, G particles have a probability P~ of detach¬ 
ing from the crystal again. Larger crystals are more tightly 
connected and less likely to break apart than smaller crys¬ 
tals due to a larger number of kinks holding the individual 
particles together (Burton and Cabrera, 1949). An increased 
size yields a lower probability of detachment and therefore 
increases the probability for further growth. This introduces 
asymmetry in the crystal growth process, reflected in our 
model system by a detachment probability function that is 
negatively exponential to the crystal size n with reaction pa¬ 
rameter 7 “: 


With 7 “ G [-1,-2], P~ is relatively low: once formed, 
crystals only break apart sporadically, leaving hardly any G 
particles available for attachment and potential for further 
growth. At t = 5000, the G particle locations are maximally 
constrained for 7 “ = —4. 

Reciprocal catalysis 

Particle types A to F are used to model self-organization 
through reciprocal catalysis. Particles of type A and type B 
may react to form a C particle when both are located in the 
same tile; similarly for particles D and E forming F: 


P g = (1 + exp [7 ]) n , 

with 7 - G M, n > 2. Following equation (1), event Xi is 
defined with respect to self-assembly as the observation of 
a G particle at tile i, where G n crystals are counted as n 
observations. Therefore, the generation of constraint during 
this process is examined using 


H(G,t) = - 


log2 ^tiles 


n tiles 

E 

i =1 


1 qm 

|GV)| 


log 2 


\Gi(t)\ 
\G(t)\ ’ 


for the normalized information entropy of G at time t. 




Figure 3: Decrease of normalized information entropy 
H(G,t) during self-assembly of 1000 G particles with 
7 + = 1, for several detachment probabilities: — 7 “ = 

—5,- 7 _ = —4,-7 _ = —3, ■■■* 7 _ = —2, 

7 - = —1. The images on the right depict how 
H(G , t) correlates with the distribution of G particles over 
the grid, ranging from an almost homogeneous distribution 
(top) to a few G n crystals (bottom). Results are averaged 
over 100 trial runs. 

Figure 3 shows the development of H(G,t ) over time, for 
different values of 7 “. With 7 “ = — 5 the probability of de¬ 
tachment P~ is relatively high, leading many crystals to fall 
apart and the constraint on particle G locations being low. 


F 


A + Bt±C , 

(4) 

c 


D + Et±F . 

(5) 


Particles F and C are catalysts for the left-to-right reactions 
of (eq. 4) and (eq. 5), respectively. In order to accommodate 
for the system dynamics required in later sections, reaction 
probability P+ decreases exponentially with 1/n where n is 
the number of catalysts present at the same tile i, i.e. n = 
\Fi \ for (4) and n = \Ci\ for (5): 

P+ = (l+exp[p+])-( 1+n ) -2 , 

with p + G M. The right-to-left reactions (i.e. C splitting 
into A and B and F into D and E ) occurs with probability 
P r 7 for each C and F particle at every time step: 

p- = g- e [ 0 , 1 ] . 

Similar to self-assembly, reciprocal catalysis is a locally 
nonlinear process: one catalytic reaction increases the like¬ 
lihood of another catalytic reaction occurring. However, the 
observable artifacts of reciprocal catalysis (i.e. the produced 
catalysts) may cease to exist once this amplification process 
no longer takes place, or they may diffuse to different loca¬ 
tions. To quantify the generated constraint, we therefore use 
the probability distribution of reaction locations as observed 
events, rather than the locations of catalysts themselves. The 
constraint generated by reciprocal catalysis is quantified by a 
decrease in normalized information entropy. Following (eq. 
1 ) event Xi is defined as the observation of catalytic reaction 
R at tile i: 


UV/D f \ = 1 jjlMl log IjiM 

( ’ } log 2 ntues E |iJ(t)| g 2 \ R{t) \ • 


In order to investigate the effect of parameters q + and g~ 
on the normalized information entropy, H(R,t) is averaged 
over time: 


-< ^max 
W t= 1 
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Figure 4: Normalized information entropy during reciprocal 
catalysis after initialization with 1000 particles distributed 
equally among types A, B , D , and E. For given p + and g ~, 
H(R , t) is averaged over 5000 time steps and 10 trial runs. 
The right-side images depict the distribution of catalytic re¬ 
actions over the grid. 

Results for £ max = 5000 are shown in figure 4. It is found 
that distribution R is maximally constrained for g + « 6 and 
g~ > 0.5 (i.e. when catalysts break up regularly). 

Constraint Preservation 

Under particular extrinsic conditions (i.e. 7 + , 7 “, g + and 
g~) self-assembly and reciprocal catalysis generate con¬ 
straint spontaneously. If these conditions are subject to unfa¬ 
vorable changes, the constraints produced will also be elim¬ 
inated spontaneously. Here, we consider how a higher-order 
linkage between these processes may preserve constraint 
by preventing spontaneous dissipation in unfavorable con¬ 
ditions. 

The autogenic process consists of a mutually constraining 
coupling. G particles generated by reciprocal catalysis are 
created in close proximity to one another due to the locality 
of catalytic amplifications, thereby increasing the likelihood 
of crystal growth; at the same time G n crystals preserve a 
potential for reciprocal catalysis, by encapsulating catalysts 
and thereby preventing exhaustive catalysis. 

Modeling this synergetic linkage, the left-to-right reaction 
of (eq. 5) is changed as follows: 

D + E^F + G. 

In order to balance out the production of G particles and 
keep the system (approximately) closed, G particles are re¬ 
moved from the simulation with probability P~ = (1 + 
exp[ 7 - ]) -1 for every G at each time step. Furthermore, 
reaction (eq. 3) is modified such that crystal growth leads 
to the encapsulation of any C or F particles located in the 


same tile, while crystal breakup results in a release of C and 
F particles: 

G n (kC, mF) +G+pC+qF G n+ 1 ((k+p)C, ( m+q)F ), 

( 6 ) 

G n (kC, mF) -► G ra_1 + G + kC + mF , 

with k,m,p,q > 0 and n > 2 , and where a crystal of size 
n containing k C particles and m F particles is denoted as 

G n {kC,mF). 

Parameterization 

The results in figure 3 show that a decrease in H(G, t) may 
occur when 7 + is fixed at 1. Similarly, figure 4 shows that 
g~ > 0.5 allows for relatively low values of H(R, t). Here, 
we investigate the ranges of 7 “ (horizontal axis) and g + 
(vertical axis) that allow for self-assembly and reciprocal 
catalysis to occur simultaneously. 




Figure 5: After initializing the simulation with 1000 par¬ 
ticles uniformly distributed among types A, B , D and E , 
we run it for 5000 time steps with 7 + = 1, g~ = 0.5 
and 7 - ,p + E [—10,10]. The four figures above show the 
average normalized information entropy of G particle lo¬ 
cations (top left), the average normalized information en¬ 
tropy of catalytic reaction locations (top right), the average 
symmetrized Kullback-Leibler divergence with smoothing, 
where SDkl(i~, Q + ) = max(SDKL) if no occurrences 
are found (bottom left), and the sum of these three figures, 
where SDkl has been normalized using scaling coefficient 
/3 (bottom right). 

The redundancy between the distributions of G particle lo¬ 
cations and catalytic reactions is considered to be an indica¬ 
tion of the amount of interaction, i.e. it measures whether 
crystals tend to be located in proximity to catalytic reac¬ 
tions and vice versa. This redundancy is quantified using the 
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Kullback-Leibler divergence with smoothing (eq. 2), which 
is symmetrized to obtain a commutative measure: 


SD KL (G,R,t) = D KL (G(t)\\R(t)) + D KL (R(t)\\G(t)) 


n tiles r 


= £ 




\GM lfr(*)l 

\G(t)\ |fl(t)| 


log 2 


\Gi(t)\ /IRM 


\G(t)\/ |i?(i)| 


with smoothing (eq. 2 ) applied if necessary. 

Autogenesis requires that self-assembly and reciprocal 
catalysis both take place in each other’s proximity. The de¬ 
sired values for parameters 7 “ and g + are therefore esti¬ 
mated by minimizing 


H(G, t) + H(R, t)+p SD kl (G , R, t ) , 
where coefficient [3 scales SDkl{G , i?, t) to [ 0 , 2 ] 


(3 = 2 ( max 

V 7-, p+e[—10,10] 


SD KL (G,R,t )) 


such that the normalized information entropies and the di¬ 
vergence between the distributions contribute equally to the 
sum. 


Comparison 



Figure 7: Normalized information entropy H(G,t) during 
self-assembly (without reciprocal catalysis). Here, instead 
of being generated by a catalytic reaction, G particles are 
added to random grid locations at the same rate as in the 
previous simulation (see fig. 6 ). Again, 7+ = 1, = 6 , 

g~ = 0.5, and 7 “ = 0 for the first 1000 time steps, while 

thereafter.- 7 “ = —4,- 7 “ = —3, — 7 “ = 

—2, 7 _ = — 1, ——— 7 “ = 0. Results are averaged 

over 1000 trials. 



Time 

Figure 6 : Normalized information entropy H{G,t) during 
autogenesis, with 7+ = 1, g + = 6 , g~ =0.5, and 7“ = 0 
for t G [0,1000], and 1000 particles evenly distributed 
among types A, B , D and E initially. For t G [1001, 5000], 

.- 7 - = “4, -7“ = -3, 7 - = -2, 

7 - = —1, ■“—— 7 _ = 0. Results are averaged over 
1000 trials. 

Figure 6 shows the normalized information entropy when, 
after an initial phase with conditions that enable self- 
assembly, the value of 7“ is changed. For 7“ G [—5, —4] 
after t = 1000 , the high probability of detachment is not 
conducive for the prolonged persistence of crystals, and they 


fall apart. For 7 “ at 0, H (G, t) continues to develop unper¬ 
turbed. Changing 7 “ to —1 results in a lower normalized 
information entropy, as more G particles detach and subse¬ 
quently attach to larger crystals (cf. fig. 3). With 7 “ changed 
to — 2 , H(G,t) initially drops and eventually finds a new 
equilibrium at a value higher than with 7 e [— 1 , 0 ]. 

Figure 7 shows a similar experiment where the particles 
necessary for reciprocal catalysis (A to F) are omitted from 
the experiment. Here, G particles are added at random grid 
locations at the same rate as they were generated by recipro¬ 
cal catalysis in the previous experiment. Spatial proximity 
is therefore no longer biased by reciprocal catalysis and the 
absence of catalysts excludes the possibility of encapsula¬ 
tion. 

Comparing both figures, we find that for 7 “ = — 2 con¬ 
straint is preserved when a synergetic linkage between self- 
assembly and reciprocal catalysis is present while it largely 
falls apart in the case of mere self-assembly. Also for 
7 “ = — 1 , the value of H (G, t) remains lower with this link¬ 
age than without it. These particular changes to 7 “ show 
that an autogen may resist dissipation despite unfavorable 
extrinsic conditions; the intrinsic dynamical constraint be¬ 
tween self-assembly and reciprocal catalysis allow it to per¬ 
sist. 
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Constraint Selection 

The preservation of dynamical constraint is a higher-order 
process: not only is the number of macroscopic states re- 


















































duced as the spatial distribution of events becomes more 
constrained, but the distribution over specific constraint 
types (e.g. the specific form of crystals) is itself also re¬ 
duced. To experimentally quantify this second-order reduc¬ 
tion we initialize each newly formed crystal with a property 
c, a random integer between 1 and 5 that affects equation (6) 
by limiting the total number of catalysts that a crystal may 
contain. This property c reflects how the shape of a crystal 
may affect its containment capacity (fig. 8) without explic¬ 
itly modeling the physical geometry of crystals. 

The experiments of figures 6 and 7 are repeated while 
crystals are assigned random values for c. The average size 
of crystals is shown in figure 9. 


Rather, the value of c affects the size of crystals indirectly, 
as the numbers of catalysts in a tile affects the production 
of new G particles, and thereby the crystal’s capacity for re¬ 
constitution if those catalysts are released upon detachment. 
This work cycle creates a difference in size between crystal 
geometries, a difference which is maintained despite the ab¬ 
sence of a direct causal link between crystal geometry and 
the underlying self-organizing processes. 

This higher-order constraint is quantified using the nor¬ 
malized information entropy over the distribution of the con¬ 
tainment capacities of crystals. With p c the probability that 
a crystal has a containment capacity c, \G™\ the number of 
crystals with capacity c and \G n \ = \G™\ the total num¬ 

ber of crystals, 


Figure 8: Due to the way G particles attach to one another, 
crystals with different geometries may come about. This 
illustration shows several crystals G™ of equal size (i.e. n = 
5) but with different capacities (c) for containing catalysts 
due their particular shape. 



Figure 9: Fixing at —0.5 and 7 + = 1, g + = 6, and 
g~ = 0.5, the grid is again initialized with 1000 particles of 
types A, B , D and E. For 1000 trial runs over 5000 time 
steps, the mean crystal size and standard error of the mean 
are reported for the five different containment capacities. 

The difference in average crystal size between autogenesis 
and self-assembly can be inferred from the results of the 
previous experiments. However, the competition between 
crystals with different c leads to a second notable differ¬ 
ence: the size of an autogen appears to be correlated with 
its containment capacity. In our simulation the geometry of 
a crystal does not affect its size directly, as the probabilities 
of crystal formation and detachment are independent of c. 


H(G n c ,t) 


1 A |Gg(t)| |G?(*)| 

l°g 2 5^|G«(f)| \G n (t)\ 


Figure 10 shows the average H(G ™, t) for the previous ex¬ 
periments. Selection of crystal geometries accounts for the 
difference in normalized information entropy over contain¬ 
ment capacities during autogenesis. 


0.995 



Figure 10: Higher-order constraint: the average normal¬ 
ized information entropy over the distribution of crystal 
capacities H(G™,t) is substantially lower for autogenesis 
(self-assembly + reciprocal catalysis) than for self-assembly 
alone. Results averaged over 1000 trial runs. 


Conclusions 

The statistical distributions used to quantify self¬ 
organization and autogenesis in this paper (G particle 
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locations, R reaction locations, and G™ containment ca¬ 
pacities) are all expressed in terms of information entropy. 
This type of quantification does not distinguish between the 
physico-chemical constraints produced by self-organization 
and the substrate independent, formal constraint made 
possible by autogenesis. Taking the physical processes that 
underlies the maintenance of far-from-equilibrium states 
into account requires further research (Beer, 2004). New 
tools capable of expressing this dynamical difference need 
to be developed (Deacon and Koutroufinis, 2014). 

The experimental explorations described in this paper do 
not quantify all aspects of autogenesis, nor do they provide 
a complete overview of autogenic properties and phenom¬ 
ena. Rather, they serve to demonstrate the preservation ca¬ 
pacity of synergetically coupled processes, and the higher- 
order reduction of macrostates constituted by formal type 
selection that may emerge from competition between auto¬ 
gens. These two autogenic capacities may help explain the 
possible emergence of proto-life. Understanding life’s ori¬ 
gins does not necessarily imply understanding life as we find 
it around us today (Cleland, 2013), but the emergent dynam¬ 
ics that created life may also take part in shaping mind and 
society (Thompson, 2010) as the accumulation of generated 
constraints that is allowed by preservation through a higher- 
order linkage is what ultimately makes selection possible. 

References 

Beer, R. D. (2004). Autopoiesis and cognition in the Game 
of Life. Artificial Life, 10(3):309-326. 

Bigi, B. (2003). Using Kullback-Leibler distance for text 
categorization. In Proceedings of the 25th European 
Conference on IR Research , ECIR’03, pages 305-319, 
Berlin, Heidelberg. Springer-Verlag. 

Bilotta, E. and Pantano, P. (2005). Emergent patterning 
phenomena in 2D cellular automata. Artificial life, 
11(3):339—362. 

Burton, W. K. and Cabrera, N. (1949). Crystal growth and 
surface structure. Part I. Discuss. Faraday Soc ., 5:33— 
39. 

Cleland, C. E. (2013). Conceptual challenges for contempo¬ 
rary theories of the origin(s) of life. Current Organic 
Chemistry , 17:1704-1709. 

Cover, T. M. and Thomas, J. A. (1991). Elements of Infor¬ 
mation Theory. Wiley. 

Deacon, T. W. (2006). Reciprocal linkage between self¬ 
organizing processes is sufficient for self-reproduction 
and evolvability. Biological Theory , 1(2): 136-149. 

Deacon, T. W. (2012). Incomplete Nature: How Mind 
Emerged from Matter. W.W. Norton and Company, 
New York, NY. 


Deacon, T. W. and Koutroufinis, S. A. (2014). Complexity 
and dynamical depth. Information , 5(3):404-423. 

Eigen, M. and Schuster, P. (1979). The Hypercycle - A Prin¬ 
ciple of Natural Self-Organization. Springer, Heidel¬ 
berg. 

Fellermann, H., Rasmussen, S., Ziock, H.-J., and Sole, R. V. 
(2007). Life cycle of a minimal protocell—a dissipative 
particle dynamics study. Artificial Life, 13(4):319-345. 

Haken (2006). Information and self-organization: a macro¬ 
scopic approach to complex systems. Springer. 

Harder, M. and Polani, D. (2013). Self-organizing particle 
systems. Advances in Complex Systems, 16. 

Jost, L. (2006). Entropy and diversity. Oikos, 113:363-375. 

Kauffman, S. A. (1993). Origins of Order: self-organization 
and selection in evolution. Oxford University Press, 
New York, NY. 

King, G. A. (1982). Recycling, reproduction, and life’s ori¬ 
gins. Biosy stems, 15:89-97. 

Kullback, S. and Leibler, R. (1951). On information and 
sufficiency. The Annals of Mathematical Statistics, 
22(l):79-86. 

Maturana, H. R. and Varela, F. J. (1980). Autopoiesis and 
cognition. Reidel, Boston, MA. 

McMullin, B. and Varela, F. J. (1997). Rediscovering com¬ 
putational autopoiesis. In Husbands, P. and Harvey, I., 
editors, Fourth European Conference on Artificial Life, 
pages 38-48. MIT Press, Cambridge, MA. 

Plasson, R., Brandenburg, A., Jullien, L., and Bersini, H. 
(2011). Autocatalysis: At the root of self-replication. 
Artificial Life, 17(3):219-236. 

Polani, D. (2008). Foundations and formalizations of self¬ 
organization. Advances in Applied Self-Organizing 
Systems, pages 19-37. 

Prigogine, I. and Stengers, I. (1984). Order Out of Chaos. 
Bantam Books, New York, NY. 

Rosen, R. (1991). Life itself: A Comprehensive Inquiry into 
the Nature, Origin, and Fabrication of Life. Columbia 
University Press, New York. 

Shannon, C. E. (1948). A mathematical theory of commu¬ 
nication. Bell Systems Technical Journal, 27:379-423. 

Thompson, E. (2010). Mind in Life. Harvard University 
Press, Cambridge, MA. 

Varela, F. J., Maturana, H. R., and Uribe, R. (1974). Au¬ 
topoiesis: The organization of living systems, its char¬ 
acterization and a model. BioSystems, 5:187-196. 


75 



Protocells: what we have learned about minimal life and evolvability 

12 1 1 

Steen Rasmussen ’ , Adi Constantinescu and Carsten Svaneborg 

Center for Fundamental Living Technology (FLinT), University of Southern Denmark 
2 Santa Fe Institute, New Mexico, USA 
steen@sdu.dk 


Abstract 

In the paper we review lessons learned about two major 
evolutionary transition from a bottom up construction of 
protocells. We use a particular systemic protocell design 
process as a starting point for exploring two fundamental 
questions: (1) how may minimal living systems emerge from 
nonliving materials? - and (2) how may minimal living 
systems support open-ended evolutionary richness? 

Non-life-to-life transition 

Novel functionalities in physicochemical systems can be 
generated naturally in three ways: by the assembly of 
structures (equilibrium processes), by self-organization (non¬ 
equilibrium processes) and by a combination of the two, 
through the evolution of structures (Rasmussen et al., 2001). 
Our approach to create minimal living systems, which we 
define as protocells (Szostak et al., 2001, Rasmussen et al., 
2004, Sole et al., 2007, Kurihara et al., 2015), utilizes both 
self-assembling and self-organizing processes. We investigate 
how a controlled environment together with coupled self- 
assembly and externally driven self-organization may play 
together to generate minimal, self-replicating, 
physicochemical systems. Thus our systemic protocell 
approach requires a simple metabolism controlled by 
information both kept together by a container. 

We have successfully implemented a particular protocell 
around a ruthenium tris(bipyridine) Ru(bpy) 3 complex that 
uses light to catalyze redox reactions on precursors of both the 
amphiphiles and the information in bulk. The informational 
system serves as part of an electron relay that modulates the 
metabolic reaction rate, which in turn depends on the redox 
potential (the nucleobase composition) of the information 
molecule (DeClue et al., 2009). 

In particular, we have established that the amphiphile 
production can be controlled by chemical information. The 
reduction potential of a nucleobase, 8-oxo-guanine [oxoG], 
can be exploited by the photocatalyst to produce amphiphiles, 
but not that of guanine (the next most easily oxidized 
nucleobase) or by extension, those of A, C, U and T. 
Furthermore, fatty acid vesicles will influence the production 
rates as a detailed investigation of the information-photo¬ 
catalyst configuration showed, especially when both oxoG 
and the Ru(bpy) 3 are independently attached through 
hydrophobic anchors into the container (Maurer et al., 2011). 
Further we have established a photochemical fragmentation 
scheme to ligate DNA oligomers. First, the deprotection of an 
oligomer is performed using a Ru(bpy) 3 photosensitizer. This 
oligomer can only then, and in the presence of a template, be 
ligated with another oligomer (Cape et al., 2012). 

We have demonstrated several advantages of our systemic 
approach integrating the three mutually supporting 


components: (i) self-assembly of a decanoic acid container; 
(ii) anchoring to the container a metabolic ruthenium complex 
as well as (iii) a conjugated nucleic acid information complex; 
(iv) container feeding and growth; (v) metabolically driven 
container replication; (vi) metabolically driven nucleotide 
oligomer ligation (part of replication); (vii) one pot metabolic 
production of both amphiphilic molecules and ligated 
oligomers, new information molecules. These are all key 
milestones toward the construction of a minimal living 
system. However, one key milestone is not yet reached before 
full protocell integration can occur: To implement an effective 
DNA self-replication process based on template directed 
ligation of two smaller oligomers. 

Missing link: Template directed ligation 

We can derive the dependence of the overall replication rate 
constant on hybridization energies, temperature and strand 
length, by employing a model for the minimal ligation-based 
replication process of a single-stranded template in which the 
ligation of oligomers is involved in the formation of the 
complementary replica. Within the template directed 
replicator system, two complementary oligomers hybridize to 
a single stranded template. An irreversible ligation reaction 
(i.e., formation of covalent bonds in a condensation reaction) 
transforms the oligomers into the complementary copy of the 
template. The newly formed double strand can dehybridize, 
thus allowing for iteration of the process. Throughout the 
replication mechanism, we neglect both the production of 
waste as well as the hydrolysis of ligation. The resulting 
overall reaction rate is derived, Constantinescu et al., 2016, 
and summarized below in Fig 1, where two cases are 
discussed, both assuming parabolic template growth. When 
product inhibition is rate limiting both longer strands and 
higher temperatures increase the replication rate. When 
hybridization is the rate limiting factor, an optimal set of 
temperature and strand lengths exist. 

In the presented protocellular system evolution may be 
defined in the following way: Compositional information, 
which is defined as the content and location of oxoG in the 
information strand, determines the metabolic reaction rates 
through an electron relay. These processes require a variety of 
environmental conditions including sacrificial proton and 
electron donors (DeClue et al., 2009, Maurer et al., 2011, 
Cape et al., 2012). 

Thus a modification of the compositional information 
generally results in modified metabolic reaction rates. Thus, 
for the simple protocellular model in a fixed environment, we 
expect Darwinian evolution to be a metabolic reaction rate 
optimization process where presumably the overall replication 
rate (the phenotype) is enhanced. At the molecular - or 
“genotypic” - level, this means a change (through selection 
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Figure 1. Effective overall replication rate constant k as a function of 
strand length and temperature, (a) and (b) correspond to a template 
direct replication mechanism which suffer from product inhibition 
within a slow (i.e., rate limiting), respectively fast (i.e., not rate 
limiting) ligation reaction (After Constantinescu et ah, 2016). We 
note that the replication rate depicted in (a) has been obtained by 
Fellermann and Rasmussen, 2011, employing thermodynamic 
arguments as well as a polymer model for oligonucleotides that 
allows simulation of their diffusion and hybridization behavior. 


and amplification) of the compositional information of the 
nucleotide strands, as they are being inherited. 
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Figure 2. Connection between the details included in the simulations 
and the ability for the simulations to generate targeted observables. 
Left side of table summarizes the included physical model (each 
row). Right side of table indicates the higher order observable 
phenomena/functionalities generated by the simulation. Top of table 
depicts the qualitative information details needed in a molecular 
model representation (the data structure) of the simulation (columns). 
Simulations with more detailed, and thus more complex, molecular 
components are able to generate increasingly more complex 
dynamics and functionalities. As an example, data structure D 3 has 
included enough molecular interaction details to allow the simulation 
to generate molecular self-assembly and e.g. micellar and vesicle 
formation. The last row is left open as we conjecture: to obtain a 
higher evolutionary potential we need to add more components 
and/or resources to the system. 


Expanding evolvability 

We know from experimental and theoretical investigations 
that if constituent components and environment are too 
simple, only trivial emergent structures will be generated. As 


(appropriate) diversity/complexity of the constituent 
components is increased, emergence of hierarchies or 
multilevel structure may occur. Thus, it seems natural to 
assume this conjecture could also be extended after self¬ 
replication and simple Darwinian evolution has been achieved 
for a protocell. A discussion of the involved constituent 
components of a protocellular simulation is found in Fig 2. 

In practice, more variation could include more and different 
resource oligomers (short nucleotide libraries), changes in the 
fatty acid composition, adding different photo synthesizes 
molecules. Impact or performance could then be measured at 
the resulting metabolic rate, container division properties and 
life-cycle (generation) time. However, given our experimental 
experiences, this would be a challenging and time-consuming 
enterprise as each new component in the mix in principle 
could cause undesired (destructive) side effects. 

Bedau et al., 1998, propose a statistical characterization of 
evolutionary processes that aims to quantify the innovative 
potential of an evolutionary process by measuring the rate that 
innovative changes are produced during the evolutionary 
process. In this classification scheme our protocellular 
systems falls into Class 2. Class 1 is a neural evolutionary 
process and includes diffusion processes. Class 3 is defined as 
evolutionary processes with an apparent open-ended ability to 
innovate and includes examples from biological evolution and 
technological evolution. 


Conclusions 

(a) The distinction between non-living and living matter is 
best characterized as a grey-zone where minimal living 
systems have the properties discussed above, (b) If we require 
life to exhibit open-ended evolution, the presented protocell 
(or for that matter, any published protocellular model we are 
aware of) does not qualify as a minimal living physicchemi- 
cal process. However, if Class 2 evolution suffices, several of 
the published protocellular models, if successfully integrated 
and experimentally implemented, would qualify as minimal 
life-forms, (c) To enhance the evolutionary potential of a 
protocellular system (any system) more richness has to be 
added to the system. How this system expansion could occur 
depends on the details of the system. 

This work was in part supported by the EC Grant #318671. 
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Abstract 

This paper describes possible applications of a two dimensional 
array of programmable electrochemically active elements to 
Alife. The array has been developed as part of the MICRE- 
Agents project, and after several design phases, is now a mature 
enough device for general use beyond the project. Here we de¬ 
scribe the general properties of the device based on the first two 
design phases, some of its capabilities, including portable exper¬ 
imentation, and discuss its potential application to ALife and in 
education. 

Device Origin and Design 

The device we describe here is a part of a larger project, 
MICREAgents (McCaskill et. al., 2012). The original vision 
of the project included two major technological components: 
Lablets are small (-100 x 100 x 50pm), autonomous electronic 
elements, comprising a form of smart, programmable, electro¬ 
chemically active ‘dust’, and unlike conventional smart dust 
communicating via pairwise interactions rather than wireless 
radiation. Lablets are poured into a solution, and can interact 
with the surrounding solution, with each other, and with smart 
surfaces. A dock is such a static two dimensional array of 256 
x 256 microelectrodes (see Fig. 1) beneath a fluid film and con¬ 
nected to a host computer, from which each of the sites may be 
independently controlled. One goal of MICREAgents was to 
develop this technology to enable a new form of evolution 
through the interaction of chemistry with these new hybrid in¬ 
formational-electrochemical elements. 

In this article we concentrate on the dock, and aim to give an 
overview of some of its properties and capabilities for ALife. 
We defer technical details to upcoming publications. 

Novel dock electronics were developed at AIS to allow indi¬ 
vidually programmable electrochemical coating of sensors and 
actuators. Chemical solutions may be applied on the dock, ei¬ 
ther in the form of droplets covering a subset of dock sites, or a 
fluid layer covering the entire lattice. 

Experimental Examples 

Electrochemiluminescence (ECL). Luminescence may be 
stimulated electronically (Liu, et. al. 2015), and this has found 
widespread application in both display technology and diag¬ 


nostic tests. ECL provides a convenient optical report of elec¬ 
trical activity via electrochemical reactions. We implemented a 
version of ECL based on Ruthenium Tris(2,2')bipyridyl/Tri- 
propylamine (Zu, Bard, 2000), illustrated below in Fig. 2, to 
demonstrate the spatially resolved activation of reactions pos¬ 
sible with the dock. 

Galvanic deposition. Also known as electroplating, this is one 
of the chemically simplest forms of electrochemistry. Applica¬ 
tion of a potential across two electrodes in a salt solution can 
cause deposition of metal on one of the electrodes. It has been 
shown that appropriate conditions (concentrations and tem¬ 
poral variations in voltage application) may generate complex 
structures on sub-micron scales (Sharma, et.al ., 2008). Com¬ 
plex surfaces can be useful on the nano- and micro-scales, be¬ 
cause they can be used to enhance the surface area of catalysts 
and also to form supercapacitors. 


Figure 1. Left: a view of the dock close to actual size (4.6 mm square). 
Right: A closeup of the dock. A unit cell of the 128 x 128 array con¬ 
tains four 12 fjm square electrodes (outlined by the black dotted lines), 
with a differential sensors and a split actuator. Between the electrodes, 
the structure of the the electronic control circuitry is visible, on a layer 
beneath the electrodes. 

More complex electrochemical reactions involving DNA and 
signal amplification can also be initiated, and the project has 
investigated differential control of chemical reactions using dif¬ 
ferent voltage signals (Freage et.al., 2015). 
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Figure 2. An implementation of spatial ECL (Ru(bpy) 3 2+ based) on 
the (CMOS1) dock, with each lighted site stimulated by the applica¬ 
tion of a voltage to a microelectrode (global counter electrode). 


Combinatorial exploration with the dock 

In general, the dock provides a high throughput platform to ex¬ 
plore an extremely large space of possible chemical reactions, 
and chemical reaction sequences. This is true for a wide variety 
of base chemistries, including DNA (Freage et.al. 2015), pep¬ 
tide, carbohydride, and RNA chemistries (Chen, La, Zhou, 
2014). Efficient use in combinatorial exploration requires re¬ 
producible cleaning of microelectrodes. Gold electrodes on the 
dock can be cleaned either chemically (e.g. (Fischer et. al., 
2009) or physically, e.g. with C0 2 snow (Kern, 1990), but some 
coatings are difficult to clean without resource to mechanical 
polishing. 

Portable experimental setup 

A portable experimental setup is constructed using a 3D- 
printed scaffolding with a USB microscope mounted on top of 
the dock’s computer controller, see Fig 3. 

Figure 3. Portable experi¬ 
mental setup for the dock. The 
microscope camera is 
mounted atop of an adjustable 
aluminum tube, mounted onto 
a 3D printed scaffolding 
(black - with lights inside), fit¬ 
ted onto a 3D printed dock 
holder (black), sitting atop of 
the USB dock-controller box 
(red). The raspberry pi host 
computer (running linux) is 
shown on the right. Cables 
are removed for clarity. 

Because of the nature of the 
dock’s electronic fabrica¬ 
tion, it is relatively easy to 
make many copies, so the 
dock could also be used in 
an educational context, with each student able to have their own 
copy of a dock. 


Application to artificial life 

The dock should be useful for novel origin of life experi¬ 
ments, to discover chemistry that enables the transition from 
nonliving to living matter. A version of the Miller-Urey exper¬ 
iment could be implemented, with the dock’s spatial separation 
and control giving far more experimental range. Redox poten¬ 
tials provide a specific source of energy, and specifically coated 
electrodes provide a programmable distribution of mineral or 
organic catalysts that can allow controlled investigation of 
complex spatially resolved chemical evolution. 

The dock is also able to interact in programmable ways with 
microparticles, including the lablets discussed previously. Such 
interplay between autonomous programmable mobile electro¬ 
chemical elements and smart docking surfaces may allow the 
construction of artificially self-reproducing systems with both 
electronic and chemical facets (Tangen et al., 2015). Further, 
McCaskill has proposed electronic genomes that can direct 
chemistry and are heritable. Wills and McCaskill have con¬ 
ceived the electrochemical equivalent of a genetic code: this 
time coupling copyable electronic with hard to copy chemical 
systems, rather than DNA genes and proteins. 

The high throughput capabilities of the dock could be further 
enhanced with fluidic overlays. The simplest possible overlay 
would simply enable laminar flow of fluid across the dock from 
one side to another. Standard PDMS microfluidics could create 
128 independent channels across the dock. 

This work was supported by the European Commission under 
Grant #318671. 
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Abstract 

For the experimental construction of artificial cell, it is a challenge to 
simultaneously supply the nutrients and lipids required for protein 
synthesis, gene replication, membrane growth, and fission. Inner 
reactions of liposomes are not permanent because of nutrient 
exhaustion since liposomes do not have pores or channels on their 
membrane for acquisition of nutrients. In this study, we demonstrated 
that the liposome containing in vitro translation system was fused 
with liposome encapsulating RNA by freeze and thaw for GFP 
synthesis. The fusion mixed lipid molecules on the two kinds of 
liposomes, followed by the fission. Consequently, we observed GFP 
synthesis inside the liposomes after liposome fusion and fission. This 
freeze and thaw method can be repeated, for the sustainable 
supplement of nutrients with liposome growth. We hope this method 
would achieve the ultimate goal of establishing the artificial cells that 
can acquire nutrients sustainably and proliferate by coupling protein 
synthesis and gene replication compatible with membrane growth. 

Introduction 

It is important process to reconstruct life-like compartment 
with inner biochemical reaction for elucidating the border 
between life and non-life. The process could give us sights on 
the origin of life. Repetitive cycles of simple biochemical 
reaction in liposomes have already been achieved. Recently 
proliferation of liposomes was achieved with inner DNA 
replication systems (Kurihara et al., 2015). Also we reported 
sustainable RNA replication reaction with liposome 
proliferation by a freeze and thaw method similar to that in 
this work (Tsuji et al. 2016). Briefly, we mixed two types of 
liposomes, centrifuged, froze by liquid nitrogen, and thawed 
at room temperature, and it resulted in liposome fusion and 
fission. Although these reports succeeded in reconstruction of 
life-like phenomena, protein synthesis compatible with 
proliferation of liposomes has not been achieved yet. It has 
been reported that PURE system, in vitro translation system 
reconstituted with purified components, could be supplied by 
liposome fusion and protein synthesis occurred after fusion 
(Caschera et al., 2011). In this study, we show that PURE 
system can be supplied to liposomes also by freeze and thaw, 
and moreover, compatibly with the proliferation of liposomes. 


Result 

GFP synthesis induced by liposome fusion 

We first tried to apply liposome fusion induced by freeze and 
thaw for supplying PURE system. Previously we reported 
liposome fusion by freeze and thaw (Tsuji et al., 2016). We 
prepared liposomes encapsulating RNA which encodes GFP 
(RNA liposomes), liposomes encapsulating proteins of the 
PURE system (+nutrient liposomes), and liposomes 
encapsulating buffer without RNA and the proteins (-nutrient 
liposomes). After mixing up the RNA liposomes and 
+nutrient or -nutrient liposomes, we centrifuged the 
liposomes to produce a liposomal pellet, and fused them by 
freeze and thaw. Then we incubated the liposomes for 3 hours 
to induce GFP synthesis. First, we analyzed the size 
distribution of liposomes before and after freeze and thaw by 
flow-cytometer (FCM). FCM can measure the size and the 
fluorescence of each liposome. The results showed that 
liposome size was hardly changed after freeze and thaw (data 
not shown) but the lipid markers on the nutrient liposome and 
RNA liposome were well mixed (Fig 1A, right). The lipid 
mixing without size change indicates that fusion and fission 
occurred during freeze and thaw. This result was consistent 
with our previous report (Tsuji et al., 2016). Second, we 
measured GFP fluorescence of liposomes by FCM and 8% of 
the total liposomes showed GFP signals only when +nutrient 
liposomes were fused with the RNA liposomes (Fig 1A 
middle, green dots). It should be noted that liposomes with 
GFP fluorescence appeared only in the region indicating the 
mixing of the two lipid markers (Fig 1A right, green dots). 
These data indicate that the two liposomes were fused and 
inner protein synthesis occurred by mixing of the inner 
solutions. 

Then we observed the liposomes with a confocal laser 
microscope whether GFP synthesis occurred inside the 
liposomes. The images show that GFP fluorescence was 
observed inside the liposomes (Fig IB). This fluorescence did 
not appear when RNA liposomes were fused with -nutrient 
liposomes. Therefore, we concluded that the components of 
PURE system can be supplied to the liposomes without severe 
defects of protein activities via freeze and thaw. 
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Figure 1 GFP synthesis in liposomes 

(A) FCM analysis of GFP synthesis after liposome fusion. 
Vertical axes of all plots indicate the fluorescent intensity 
(F.I.) of ATT0633 lipid marker. Horizontal axes of left two 
columns show F.I of GFP and right column shows F.I. of 
ATTO390 lipid marker. Green dots indicate liposomes 
synthesizing GFP (F.I. >100). (B) Microscopic images of 3h 
incubated samples (scale bar 25pm). 



Materials 

1 -Palmitoyl-2-oleoyl-sn-glycero-3 -phosphocholine (POPC) 
was purchased from Avanti Polar Lipids (Alabaster, AL). 
Liquid paraffin (0.86-0.89 g/mL at 20°C) was purchased from 
Wako (Osaka, Japan). l,2-dioleoyl-sn-glycero-3- 
phosphoethanolamine (DOPE) labeled with ATTO 633 
(ATT0633) and DOPE labeled with ATTO 390 (ATTO390) 
were purchased from ATTO-TEC (Siegen, Germany). 

GFP synthesis in liposome 

The GFP synthesis in liposomes was induced by supplying the 
PURE system (Kazuta et al. 2014) via the freeze-thaw 
method. Liposome preparation and liposome fusion were 
performed as described in previous report (Tsuji et al., 2016). 
Liposome was prepared by using POPC. RNA encoding GFP 
(gfp-RNA) was prepared as described in the previous work by 
Kazuta et al. The RNA liposomes in this experiment were 
prepared by encapsulating 0.3 mM of each amino acid, 0.8 
mM tRNA mix, 3.75 mM ATP, 2.5 mM GTP, 1.25 mM CTP, 
1.25 mM UTP, 100 mM HEPES-KOH (pH 7.6), 280 mM 
potassium glutamate, 1.5 mM spermidine, 19 mM magnesium 
acetate, 2.5 mM phosphocreatine, 1.5 mM dithiothreitol, 0.01 
pg/pl 10-formyl-tetrahydrofolate, 200 mM sucrose, and 2000 
nM gfp-RNA. The nutrient liposomes were prepared by 
encapsulating all constituents of the PURE system, including 
ribosomes and other proteins required for protein synthesis. 
The outer solution before freeze contained same components 
of inner solution of RNA liposomes without RNA and 
sucrose. Instead of sucrose, 200 mM glucose was added to the 
outer solution. The outer solution during incubation contained 
the same components of the outer solution before freeze 
except that tRNA was not included. After freeze and thaw, 
liposome solutions were incubated at 37°C for 3 hours. The 
RNA liposomes and nutrient liposomes were labeled with the 
fluorescent lipid markers ATTO390 and ATT0633, 
respectively. 


Discussion 

We reported the establishment of protein synthesis inside the 
liposomes by supplying nutrients from outer environment via 
liposome fusion. This study showed that our liposome fusion 
method by freeze and thaw can be applied for supplying 
PURE system. Therefore, we can design the artificial reaction 
system by introducing the requisite genes and will be able to 
reconstruct the flexible and extensible life-like structure. 

In this report, only 8% of liposomes synthesized GFP, 
whereas the liposome fusion was observed in higher 
efficiency in our previous report (50%, Tsuji et al. 2016). This 
difference was in part because 39 elements were required for 
GFP synthesis, whereas only 5 elements were required for the 
previous work. Yet, it is noteworthy that macromolecules 
such as proteins and tRNAs, which are difficult to pass 
through the membrane, can be supplied by the freeze and 
thaw. By developing the gene replication system compatible 
with the presented freeze and thaw techniques, we will be able 
to perform “genotype-phenotype linked natural selection” in 
artificial cells, as a simplest form of “evolvable” protocell 
model. 


FCM and confocal microscopy analysis 

We performed FCM analysis and microscope works as 
previously reported (Tsuji et al., 2016). 
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Abstract 

A novel genetic algorithm for evolving both uniform and 
nonuniform cellular automata (CA) to perform user-defined 
computations is presented. Unlike previous approaches, the 
CAs evolved here can in general take as their input and their 
output only a subset of the cells, allowing for the design of 
CAs that are larger than the number of inputs required by the 
desired computation. It also provides greater flexibility com¬ 
pared with previous work in terms of the number of possible 
outputs. We test our algorithm by attempting to evolve both 
uniform and nonuniform ID CAs of varying sizes to compute 
the sum of two 4-bit strings, a computation requiring 8 inputs 
and 5 outputs. Results demonstrate that while the algorithm 
is unable to discover solutions using 8-cell CAs, expanding 
the number of cells beyond the number of inputs enables the 
autonomous design of 4-bit adders. 

Introduction 

In their most basic form, cellular automata (CA) are a collec¬ 
tion of simple identical components (“cells”) whose behav¬ 
iors are governed by local interactions (Von Neumann et al., 
1966; Wolfram, 1984). Time in a CA is discrete and, at each 
timestep, a cell can be in one of k states. If k = 2, for exam¬ 
ple, a cell’s state can be either 0 or 1 at a given timestep. To 
execute a CA, one must first “seed” each cell with an initial 
value. At each subsequent timestep, the CA’s rule set deter¬ 
mines how a given cell’s state is updated based on the cell’s 
current state and the states of its neighbors. The size of the 
neighborhood of surrounding cells that can influence a given 
cell’s state is determined by the CA’s radius r. If r = 1, for 
example, a cell’s state is updated based on its current state 
and the states of its adjoining neighbors. Example rule sets 
for ID CAs with r = 1 can be found in Figs. 1 and 3. 

CAs typically have periodic boundary conditions, i.e., 
cells in ID CAs are organized in a ring, cells in 2D CAs are 
organized in a toroid, etc. Canonical CAs are uniform (ho¬ 
mogeneous), meaning that each cell’s state is updated using 
the same rule set. However nonuniform (nonhomogeneous) 
CAs, where a cell’s state can be updated based on one of 
two or more rule sets, were also studied (e.g., Sipper, 1996). 
Uniform ID CAs with k = 2 and r = 1 are called elemen¬ 
tary CAs and it has been shown that at least one such CA, 


“rule 110,” is Turing complete and thus capable of universal 
computation (Cook, 2004). 

Owing to the size of the set of possible rule sets ( k k2r+1 
in the case of a uniform ID CA), a brute-force search for 
CAs that perform a specific computation is often intractable. 
Therefore, an important area of research is the autonomous 
design of CAs via genetic algorithms (GAs) and other forms 
of evolutionary computation (Sapin et al., 2009; Cenek and 
Mitchell, 2009). 

Here we present a novel, flexible genetic algorithm for 
evolving uniform and nonuniform CAs to perform user- 
defined computations. A key feature of this algorithm is that 
the number of inputs and outputs for the desired computation 
are each allowed to vary independently from each other, as 
well as from the number of cells in the CA, as long as the CA 
is at least the same size as the larger of the two. We demon¬ 
strate the capabilities of this algorithm by evolving ID CAs 
of various sizes to successfully compute the sum of any two 
4-bit strings. 

Related Work 

Nils Aall Barricelli was probably the first person to exper¬ 
iment with evolving CAs, using one of the first comput¬ 
ers ever built (Barricelli, 1963). More recently, the work 
in (Packard, 1988), as well as by the Evolving Cellular Au¬ 
tomata group at the Santa Fe Institute (Mitchell et al., 1996), 
sparked a flurry of research into the use of genetic algo¬ 
rithms to discover CAs capable of performing user-specified 
computations that continues to this day (for a review, see 
Iclanzan et al., 2011). In (Mitchell et al., 1993), for exam¬ 
ple, a GA was employed to discover rule sets for uniform 
149-cell CAs with k = 2 and r = 3 that solve the density 
classification task (DCT). To solve the DCT, a CA that has 
its cells seeded with a vector of bits must, after a fixed num¬ 
ber of iterations, resolve all of its cells to 1 in the case where 
the original vector contained more than 50% Is, otherwise 
all of its cells should be 0. Thus the number of inputs to 
the CA is the same as its size, and the output of the com¬ 
putation is determined from the final state of all of the CA 
cells. In (Sipper, 1996), nonuniform CAs were evolved to 
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solve the DCT task using a coevolutionary algorithm where 
each cell had its own genome (rule set) and its own fitness 
(evaluated independently from the other cell’s final states), 
and selection and reproduction occurred only within local 
neighborhoods. This work also demonstrated that for r = 1, 
a nonuniform CA could be evolved to achieve a high score 
on the DCT problem that is theoretically impossible to attain 
with a uniform CA. 

CAs can be evolved to perform a wide range of computa¬ 
tional tasks by first framing the problem in a manner similar 
to the DCT described above. For example, in the synchro¬ 
nization task (e.g., Oliveira et al., 2009), the CA is tasked 
with alternating between having all of its cells set to 1 and all 
cells set to 0, after having initially been iterated for a fixed 
number of timesteps starting from an arbitrary initial state. 
An interesting property of these types of CA computations is 
that global consensus among cells is achieved solely through 
local interactions. Besides searching the space of possible 
rule sets, genetic algorithms can also search for appropri¬ 
ate neighborhood topologies for a given task, as in (Darabos 
et al., 2013). 

Conway’s Game of Life, a 2D CA, is also Turing com¬ 
plete (Berlekamp et al., 2004). Computations in this case are 
performed by interpreting the interactions of patterns called 
“gliders” (Sapin et al., 2009). By searching for rules that 
produce these patterns, GAs can discover novel CAs for sim¬ 
ulating logic gates such as AND and NOT gates (e.g., Sapin 
et al., 2004, 2009). 

Finally, controllable CAs (CCAs) are evolvable ID 
nonuniform CAs tasked with producing pseudorandom 
numbers (Guan and Zhang, 2003). At each iteration, the 
current state of a separate uniform CA (that uses a prese¬ 
lected rule set) is used to determine which of two other pre¬ 
programmed rule sets to apply to each cell in the CCA. In 
addition, a second separate uniform CA (using a fourth pre¬ 
selected rule set) is used to determine whether “controllable” 
cells act as normal cells, or operate according to a predefined 
behavior, such as “keep current state.” The genetic algorithm 
is then tasked with determining which cells in the CCA are 
controllable, and which are considered to be output cells. 
The values of the output cells get translated into a number 
after each iteration of the CCA, and after many iterations 
the final set of numbers are evaluated on their randomness 
to determine the fitness of the CCA in question. In the al¬ 
gorithm presented below, the locations of input and output 
cells are evolved in a manner similar to how the locations of 
the controllable and output cells are evolved in CCAs. How¬ 
ever, contrary to CCAs, cells in the work presented here use 
the same rule set at each iteration, and the number of pos¬ 
sible rule sets that can be applied to a cell, as well as the 
rules themselves, are also evolvable. Furthermore, initial 
conditions are also evolvable in this work, whereas they are 
randomly generated in the case of CCAs. 


EvoCA 

In an effort to expand the range of problems one can tackle 
using evolved CAs, we present EvoCA, a genetic algorithm 
that allows users to evolve CAs to perform computations re¬ 
quiring a number of cells equal to or greater than the num¬ 
ber of inputs. Furthermore, EvoCA allows for the number 
of output cells to vary from a single cell to having each cell 
in the CA be an output. The number of allowed output cells 
is independent from the number of inputs, and vice versa. 
EvoCA implements these additional features by allowing the 
genetic algorithm to choose a unique cell for each input and 
output. While a given cell can only be assigned to at most 
one input and at most one output, cells can be concurrently 
designated as both an input and an output. 

EvoCA is capable of evolving both uniform and nonuni¬ 
form CAs. If the user does not confine the search space to 
uniform CAs, the evolutionary process is free to select the 
number of potential rule sets that can be applied to a cell, 
from one (i.e., a uniform CA) to a user-defined maximum. 
The evolutionary process also selects which rule set is ap¬ 
plied to each cell in the CA. 

A CA evolved using EvoCA performs a computation as 
follows. The CA is initialized using the evolvable initial- 
value vector stored in its genome (see below and Fig. 1). For 
each cell that is linked to an input, its initial value is over¬ 
written by the current value of its corresponding input. Thus 
if, for example, our computation takes one input and we set 
the CA to have five cells in total, then the cell assigned to 
the lone input will have its initial value defined by the value 
of the input, while the other four cells will have their ini¬ 
tial values determined by the initial-value vector in the CA’s 
genome. Once the CA is initialized in this fashion, it is run 
for a user-defined number of iterations, with each cell (in¬ 
cluding those designated as input and output cells) acting as 
“normal” cells, updating in parallel using the evolvable rule 
set assigned to them in the genome. At the end of the fixed 
number of CA iterations, the current values of the designated 
output cells are taken as the outputs of the computation. Ex¬ 
ample computations can be found in Fig. 2. 

An island model is used as part of the canonical EvoCA 
algorithm. Island models enhance the canonical single¬ 
population GA to evolve multiple reproductively isolated 
populations in parallel, with periodic exchanges of genomes 
between islands through “migration.” Implementing island 
models is straightforward, adds little computational over¬ 
head, and has been shown to significantly improve GA per¬ 
formance (see Cohoon et al., 1987; Grouchy et al., 2009). 

Users must define several parameters before beginning the 
evolutionary process. A radius r must be provided, as well 
as the size of the CA and the number of CA iterations per 
computation. Users must also provide a maximum number 
of rule sets, where choosing 1 forces the algorithm to search 
only the space of uniform CAs. Besides these parameters 
that apply to the CAs themselves, a variety of typical GA 
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Figure 1: An example EvoCA genome that encodes a 32-cell CA with 8 inputs and 5 outputs. This genome also has two rule 
sets that differ by a single rule. This set of rule sets is one of the two most frequent sets to appear in discovered solutions. Input 
cell locations are highlighted in light blue on the initial CA state vector, indicating that the encoded initial values for these cells 
(shown in light gray) do not affect the CA’s behavior. This genome encodes the 4-bit adder whose example behaviors are shown 
in Fig 2. 


parameters must also be set (see below). 

The algorithm presented here evolves ID CAs with k = 
2 and periodic boundary conditions, although extending 
EvoCA to evolve other types of CAs is possible. 

Genome 

We may regard a CA a as a map a : A L A L , where 
A is the alphabet of states on which the CA operates (for 
binary CAs A = {0,1}) and L is the length of the CA. The 
cardinality of A, | A\, is k and the radius of operation for each 
cell is set to a constant r, as described previously. However, 
the actual function of interest that we wish to evolve may 
be expressed as p a : A Ll A Ln , where I represents the 
subset of cells from a that serve as inputs and D the subset of 
cells that serve as outputs; \I\ = Lj < L is accordingly the 
number of desired inputs and | U \ = Lq < L the number of 
desired outputs. In the present embodiment of this concept, 
the alphabet A, the radius r, the length of the CA L, and 
input and output sizes L/ and Lq are fixed. The location of 
input and output cells are mutable although their numbers 
are not. 

Figure 1 shows an example EvoCA genome that encodes 
a 32-cell CA for a computation requiring 8 inputs and 5 out¬ 
puts (in fact, this genome encodes a 4-bit adder CA, see 
below). There is one gene for each input and each output, 
with each of these genes containing the position of its as¬ 
sociated cell represented as an integer in the range [1 ,L\. 
The genome also contains one or more rule sets, each of 
size 2 2r+1 . Owing to the fact that additional rule sets can 
be added via mutation (see below), genomes in EvoCA are 
variable-length. To determine which rule set to apply for 
each cell (in the case of nonuniform CAs), genomes also 
contain a vector of genes of length L, where each gene cor¬ 
responds to a cell in the CA and contains the unique identi¬ 
fication tag of the rule set to be applied to that cell. Finally, 


genomes contain a second vector of genes of length L con¬ 
taining initial values (either 0 or 1 for the work presented 
here) for each cell. Note that the genes in this initial-value 
vector that correspond to input cells are “neutral,” meaning 
that they do not affect the behavior of the CA in any way. 
This is owing to the fact that these initial values get over¬ 
written by input values before the initial iteration of the CA. 

Initialization. Before the evolutionary process can begin 
(at generation 0), an initial population of genomes is needed. 
These genomes are generated randomly as follows: Input 
cells are randomly chosen one at a time without replacement 
from the set of all CA cells. Output cells are chosen in the 
same fashion. Initial genomes start with only a single rule 
set, as this was found to produce the best results (see Ta¬ 
ble 2). To generate a random rule set, a single value A is 
chosen at random from the uniform distribution [0,1]. Then 
for each rule in the rule set, a random value (3 is drawn from 
the same distribution as A. If /5 < A, then the current rule 
will be set to a 1, otherwise it will be set to 0. Using this 
A term produces populations of genomes whose rule sets 
are uniformly distributed across different densities of Is, as 
in (Sipper, 1996). 

Since genomes are initialized with only a single rule set, 
the vector in the genome responsible for assigning rule sets 
to cells will be initialized with each gene pointing to the 
same initial rule set. However, in experiments where the 
number of initial rule sets is allowed to vary, a rule set is 
chosen at random (with uniform probability) for each cell. 

Finally, the initial-value vector is randomly initialized in 
the same fashion as the initial rule set, i.e., using a randomly 
selected A term to determine the expected density of Is. 

Reproduction. To produce the next generation of CA 
genomes (offspring) from the current (parent) generation, 
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relatively high fitness parent genomes are cloned. If 
crossover is to be used during the creation of an offspring 
genome, a second high-fitness parent genome is selected and 
merged with the offspring genome that was cloned from the 
first parent. The original offspring genome is preserved from 
its beginning (i.e., the top-left gene in Fig 1) to a randomly 
selected crossover point on its genome (moving from left 
to right and top to bottom along the various parts of the 
genome, as organized in Fig 1). From the crossover point 
onwards, the remaining genes on the offspring genome are 
replaced with their corresponding genes from the second 
parent’s genome. Note that if the offspring genome contains 
more rule sets than the second parent’s genome, its addi¬ 
tional rule sets will be preserved regardless of the location 
of the crossover point. Conversely, if the second parent’s 
genome contains more rule sets than the offspring genome, 
its additional rule sets will be ignored during crossover. Fi¬ 
nally, when copying an input gene from the second parent to 
the offspring genome, if it is found that the second parent’s 
gene points to a cell that is already associated with a previ¬ 
ous input gene on the offspring genome, crossover does not 
occur for the gene in question, leaving the offspring’s origi¬ 
nal gene unmodified. This rule is also applied when copying 
output genes. 

Mutations. The resulting offspring genomes (either asex- 
ually cloned or generated via sexual reproduction, i.e., 
crossover) are then subject to one or more of a variety of 
potential mutations: 

• Rules in an offspring genome’s one or more rule sets can 
be modified by a bit-flip mutation. 

• If the genome does not already contain the user-defined 
maximum number of rule sets, a new rule set can be cre¬ 
ated through a mutation. Rules in the new rule set are 
randomly generated as when rule sets are first initialized 
at generation 0 (see above). 

• Input genes can be assigned to a new, randomly selected 
CA cell that is not already designed as an input cell. A 
mutation can also cause two input genes to swap associ¬ 
ated cells. Finally, a mutation can cause an input gene to 
become associated with a cell adjacent to the cell that it 
is currently associated with. If the adjacent cell is already 
designated as an input, the next adjacent cell is selected 
instead (this process will repeat until a cell that is not cur¬ 
rently associated with an input is found). 

• Output genes are subject to the same three types of muta¬ 
tion as input genes. 

• All genes in the initial-value vector are subject to bit-flip 
mutations. 

• If a genome contains two or more rule sets, each gene that 
determines which rule set to apply to its associated cell is 


subject to a mutation that randomly selects a new rule set 
to govern its cell’s behavior. 

• If a genome contains two or more rule sets and at least 
one of these rule sets is not associated with any cells, the 
genome can undergo a mutation that removes all rule sets 
that are not in use. 

Mutations to initial-value genes that are associated with 
cells designated as inputs are “neutral” in the sense that they 
will not modify the behavior of the CA. This is again owing 
to the fact that the initial values of input cells are overwritten 
by their associated input’s value. These neutral genes may 
be expressed in future generations however, as a mutation 
may change the cell associated with an input, causing the 
former input cell to be initialized to its previously neutral 
initial-value gene at the beginning of a CA computation. 

It should be noted that a mutation that adds a rule set to 
a EvoCA genome does not force it to be used by any of the 
CA’s cells, and thus is also a neutral mutation. However, 
a future mutation may associate one or more cells with this 
new rule set, thus expressing the originally neutral mutation. 
Genomes with two or more rule sets may also end up with 
neutral rule sets through the accumulation of mutations that 
disassociate its cells from a previously used rule set (by as¬ 
sociating them with other rule sets in the genome). When 
a rule set is not associated with any cells, its rules are still 
subject to bit-flip mutations. However, since these rules are 
not used by the CA, these mutations will also be neutral, 
perhaps getting expressed in future generations if genomes 
evolve to (re)use this rule set. 

4-bit Adder Experiments 

To demonstrate the capabilities of EvoCA, we attempt to 
evolve CAs that function as 4-bit adders 1 . This requires 8 
inputs to the CA (the two 4-bit strings to be summed) and 5 
outputs (the resulting 4-bit sum and the final carry bit). This 
problem is interesting in that at least four computational op¬ 
erations must be done in the correct order before a correct 
answer can be produced. Furthermore, to allow carry bits 
from previous operations to be passed to subsequent opera¬ 
tions, a solution to this problem will necessitate some form 
of memory. 

For all of the experiments described here, a minimal ra¬ 
dius of r = 1 is used: Each cell has access to only its own 
state and that of its two adjacent neighbors. Each CA in the 
population is evaluated on all possible combinations of in¬ 
puts (2 8 = 256 total training cases). For each input string, 
the CA being evaluated has the Hamming distance between 
its 5-bit output string and the correct 5-bit answer added to 
its fitness, which is initially set to 0. This is therefore a min¬ 
imization problem, and a 4-bit adder CA will have a fitness 
of 0. 

! The source code for these experiments is freely available at 

https://github.com/pgrouchy/EvoCA 
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We use 100 islands arranged in a ring, with each island 
having a population of 100 CAs, for a total population size 
of 10,000 CAs per run of EvoCA. Islands evolve in parallel 
and are reproductively isolated, except for occasional migra¬ 
tion events. In the experiments presented here, migration is 
implemented as follows: islands receive 5 randomly selected 
genomes from the island to their right every 10 generations 
and these 5 incoming genomes overwrite randomly selected 
preexisting genomes in the population. 

On each island, tournament selection, where the CA 
genome with the best fitness (smallest Hamming distance) 
is selected from a group of randomly chosen CA genomes, 
is used to select parents for reproduction. A tournament size 
of 10 is used for all experiments in this paper. To produce an 
offspring genome, a parent genome selected via tournament 
selection is cloned. If crossover is to be applied, tourna¬ 
ment selection is run a second time to select a second parent 
whose genome will be spliced with the offspring genome. 
Offspring genomes are also subject to a variety of muta¬ 
tions (see above). The current top CA on an island is copied 
into the offspring population without modification (elitism). 
Otherwise, crossover is applied with a probability of p c and 
each gene of offspring i is subject to mutation with a proba¬ 
bility of 1/Si, where Si is the size of the offspring’s genome. 
There is a 1% chance that all unused rule sets (i.e., a rule set 
that is not applied to any of the CA’s cells) will be removed 
from an offspring genome. 

A variety of experiments are run to investigate the influ¬ 
ence of various parameters of EvoCA on its performance on 
this problem. For each parameter configuration, 100 runs 
with varying initial populations are performed and all runs 
last for 1,000 generations. 

Results and Discussion 

The results from all performed experiments are summarized 
in Tables 1, 2, and 3. The fitness value of a given run 
is reported as the lowest fitness achieved by any genome 
throughout the entire evolutionary run. P values are cal¬ 
culated using the two-tailed Wilcoxon rank-sum test. Note 
that preliminary experiments (see Table 3) determined that 
EvoCA performs better without crossover enabled (i.e., 
p c = 0), therefore all experiments reported here are with 
p c = 0, except where noted. 

The results in Table 1 demonstrate the need for additional 
cells beyond the number of inputs. When L is equal to the 
number of inputs (i.e., L = 8), EvoCA is unable to find a 
CA that can correctly compute the addition of every possi¬ 
ble combination of two 4-bit strings. Furthermore, increas¬ 
ing the number of CA iterations has no effect on perfor¬ 
mance. By increasing L beyond the number of inputs how¬ 
ever, significant performance improvements are achieved 
and EvoCA is able to discover nonuniform CAs that are 4- 
bit adders (no solutions employing uniform CAs emerged 
from any of the reported experiments). These results also 


1011 + 1001 



1111 + 1111 




Figure 2: Three different computations from the success¬ 
fully evolved 4-bit adder CA described by the genome in 
Fig 1. Cells whose states are updated using rule set “a” are 
white, while cells whose states are updated using rule set “b” 
are gray. 
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CA 

# CA 

# 


Fitness 

size 

iterations 

success 


a 

Rank-sum 

8 

8 

0 

158.43 

41.06 

P > 0.999 

8 

32 

0 

156.16 

38.37 

P < 0.001 

24 

24 

0 

70.36 

52.51 

P < 0.020 

24 

32 

1 

51.77 

42.92 

P < 0.015 

32 

24 

11 

80.76 

62.95 

P < 0.060 

32 

32 

21 

64.09 

59.50 

P < 0.017 

32 

40 

10 

82.31 

61.94 

P > 0.965 

40 

40 

11 

87.12 

60.67 

P < 0.016 

40 

32 

6 

108.51 

53.79 



Table 1: Summary of the results from experiments where the 
CA size and the number of CA iterations were varied. Mean 
and standard deviation are labeled as /i and a respectively. 
All runs are mutation-only (i.e., p c = 0) and use the island 
model described in the main text. 

show that there is a point at which adding additional cells 
no longer affects performance (e.g., when going from 32 
cells to 40 cells), while continuing to increase computational 
costs. 

Why would EvoCA only be able to fully solve this prob¬ 
lem using CAs with more cells than inputs? One reason 
could be that with only 8 cells, the GA gets stuck in lo¬ 
cal minima. By adding additional cells, the GA can exploit 
added dimensions in the search space to escape such min¬ 
ima. Another possibility is that additional cells beyond the 
number of inputs allow EvoCA to better control the order 
and timing of key computations during the CA iterations, 
e.g., to control when the carry bit from the addition of the 
least significant input bits is added into the addition of the 
next two input bits. Lending evidence to this hypothesis 
is the fact that in all successful results one finds that while 
two inputs representing the same bit from the two 4-bit in¬ 
put strings (e.g., one input is the most significant bit from 
the first input string and the second input is the most signif¬ 
icant bit from the second input string) are often found as¬ 
sociated with adjoining cells, there are always at least two 
noninput cells separating inputs from different locations in 
the bit strings (e.g., Figs. 1, 2, and 4). Since r = 1 in all ex¬ 


periments, this separation ensures that cells associated with 
different bit locations in the input strings will not influence 
each other’s initial state updates. Moreover, by having addi¬ 
tional cells beyond the number of inputs, the EvoCA algo¬ 
rithm can exert greater control over which rules get applied 
at the initial iteration via the evolvable initial values. It is 
likely that all of the aforementioned hypotheses contribute 
to EvoCAs success when the number of CA cells is greater 
than the number of inputs, although further experimentation 
is required before conclusions can be drawn. 

The data in Table 1 also yield no obvious rule for choos¬ 
ing the number of CA iterations to maximize EvoCA perfor¬ 
mance given a specific CA size. This indicates that future 
versions of EvoCA should consider allowing the number of 
iterations to be evolvable, alongside the other parameters al¬ 
ready incorporated into the genome. 

The experiments summarized in Table 2 explore EvoCA 
performance with and without restrictions on the use of 
nonuniform CAs. The top two rows show the results from 
experiments where nonuniform CAs are allowed to evolve, 
the difference being that the data in the first row are from ex¬ 
periments where populations were initialized with uniform 
CAs only, while the data in the second row are from ex¬ 
periments where populations are initialized with a variety 
of uniform and nonuniform CAs. The third row of data 
are from experiments where only uniform CAs could be 
evolved. These results clearly demonstrate that allowing 
nonuniform CAs to evolve is necessary for the success of 
the EvoCA algorithm on this task. Furthermore, significant 
performance improvements are achieved by restricting the 
initial population of CAs to be uniform only. This is to be 
expected, however, as additional rule sets increase the size 


CA 

# init. 

# 


Fitness 

type 

rule sets 

success 


a 

Rank-sum 

n-u 

1 

21 

64.09 

59.50 

P < 0.001 

n-u 

[1-8] 

2 

129.20 

45.27 

P < 0.001 

u 

1 

0 

175.18 

46.43 



Table 2: Summary of the results from experiments that ex¬ 
plored GA performance with and without restrictions on the 
use of nonuniform CAs. Note that the top-most row of data 
are reproduced from Table 1 for comparison. Mean and 
standard deviation are labeled as fi and a respectively. Ex¬ 
periments with a CA type of “n-u” allow both uniform and 
nonuniform CAs to evolve, while experiments with a CA 
type of “u” are restricted to uniform CAs only. All runs are 
mutation-only (i.e., p c = 0) and use the island model de¬ 
scribed in the main text. 
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Pc 

Island 

# 


Fitness 


model 

success 

p 

a 

Rank-sum 

0.0 

yes 

21 

64.09 

59.50 

P < 0.001 

0.3 

yes 

3 

96.56 

58.40 

P > 0.853 

0.7 

yes 

8 

96.84 

66.02 

P < 0.001 

0.7 

no 

1 

203.11 

65.14 

P < 0.003 

0.0 

no 

1 

174.41 

58.09 



Table 3: Summary of the results from experiments where the 
CA size and the number of CA iterations were both fixed 
at 32, while the percent of offspring genomes created us¬ 
ing crossover p c and the type of GA used (island model or 
canonical) are varied. Note that the top-most row of data are 
reproduced from Table 1 for comparison. Mean and stan¬ 
dard deviation are labeled as p and cr respectively. 

of the genome, and thus the size of the search space (e.g., 
Stanley and Miikkulainen, 2002). 

The experiments summarized in Table 3 explore EvoCA 
performance when varying whether or not an island model 
and/or crossover are used. These results demonstrate that 
significant performance improvements are achieved using 
the island model described here. Surprisingly, these results 
also show that using the crossover method described here 
significantly reduces performance, and thus best results are 
achieved using mutation-only (i.e., p c = 0) EvoCA. There 
are many possible reasons for such a result. Perhaps the 
single-point crossover mechanism as described needs to be 
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Figure 3: This set of rule sets and the one shown in Fig. 1 
are the two most common sets among the 4-bit adder CAs 
discovered by EvoCA. An example behavior of a solution 
that employs these rule sets is shown in Fig. 4. 
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Figure 4: An example behavior from an evolved 4-bit adder 
that employs the set of rule sets shown in Fig. 3. Cells whose 
states are updated using rule set “c” are white, while cells 
whose states are updated using rule set “d” are gray. 

refined. Or perhaps single-point crossover works better with 
a different arrangement of genes in the genome than the one 
used here (see Fig. 1). Better results might also be achieved 
by developing an alternative method of crossover, perhaps 
based on uniform crossover (where crossover is applied at 
each gene independently, thus the arrangement of genes in 
the genome has no effect on performance). Finally, it is pos¬ 
sible that this problem domain, or even the search spaces 
engendered by the EvoCA algorithm in general, are best 
searched with mutation-only GAs. Again, conclusions can¬ 
not be drawn without further experimentation. 

While the positions of the input and output cells vary con¬ 
siderably between solutions (which is to be expected con¬ 
sidering the CA has periodic boundary conditions and addi¬ 
tion is commutative), 85.3% of all successfully evolved 4-bit 
adders contain one of two sets of rule sets: the two rule sets 
shown in Fig. 1, and the two rule sets shown in Fig. 3. A 
sample behavior from a solution that employs the set of rule 
sets shown in Fig. 3 is shown in Fig. 4. Two types of solu¬ 
tions with three rule sets were also discovered, as well as an 
additional solution with two rule sets for a 40-cell CA. No 
uniform solutions or solutions with four or more rule sets 
emerged. 

Conclusions and Future Work 

We have presented EvoCA, a novel genetic algorithm for 
evolving both uniform and nonuniform CAs to perform user- 
defined computations. This algorithm allows the size of the 
CA to be greater than the number of inputs, a necessity when 
searching for solutions to the 4-bit adder problem described 
here, and contrary to previous work. Furthermore, the num¬ 
ber of outputs is free to vary from 1 to the size of the CA, 
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independent of the number of inputs. The canonical version 
of EvoCA uses an island model to significantly improve the 
performance of the GA, although the presented crossover 
mechanism was found to be significantly detrimental to per¬ 
formance. Thus, future work should look to further inves¬ 
tigate crossover mechanisms. Another avenue of research 
is to explore the trade-offs between increased computational 
complexity and increased performance when incorporating 
GA variants such as novelty search (Lehman and Stanley, 
2008) and speciating GAs (e.g., Grouchy et al., 2009) into 
EvoCA. Of course, EvoCA has only proven itself on a single 
problem thus far, therefore future work should apply EvoCA 
to many other problems. 

Finally, the EvoCA runs presented here were computa¬ 
tionally intensive, with each run testing 10,000 CAs on 256 
different training cases per generation, for 1,000 genera¬ 
tions. Thus an important step towards applying EvoCA to 
more challenging problems will be to implement it on GPUs, 
something that should be relatively straightforward owing to 
the inherently parallel nature of both genetic algorithms and 
CAs (Zaloudek et al., 2010). 
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Abstract 

This study aims to present a new idea to expand the percep¬ 
tion area of each cell in cellular automaton (CA) using a re¬ 
cursive algorithm known as “Recursive Estimation of Neigh¬ 
bors.” An intelligent cellular process defined by the algorithm 
makes it possible to introduce an extra radius of the percep¬ 
tion area, in addition to the radius of the CA neighborhood. 

A basic CA rule is extrapolated into rules with larger radii, 
which form a sequence indexed by the extra radius contain¬ 
ing the basic CA as the first term of the sequence. The pat¬ 
terns formed in some typical sequences of extrapolated ECA 
and Life-like CA rules are presented. Contrasting pattern ac¬ 
tivities contained in homogeneous and heterogeneous CAs 
are discussed by applying mean field analysis. Some sym¬ 
metrical arrangements of composites of cells with different 
radii are used in order to discuss the heterogeneous CA. The 
new perspective presented here offers several possible appli¬ 
cations for CA. 

Introduction 

Cellular automaton (CA) is characterized by a large number 
of cells and a synchronous update of all cell states according 
to a local rule. As a result, CA has been used to describe the 
complexity emerging from interactions among simple indi¬ 
viduals following simple rules. The original concept of CA 
was introduced by von Neumann and Ulam for modeling bi¬ 
ological self-reproduction (Neuman, 1966). In the 1970s, 
Conway developed a two-dimensional CA rule, which he 
called “the Game of Life,” that exhibited complex behaviors 
evoking biological activities (Gardner, 1970 and Berlekamp, 
Conway, and Guy, 1982). In the 1980s, Wolfram studied 
one-dimensional CAs (Wolfram, 1983, 1984, 1986, 2002), 
and proposed that CA could be grouped into four classes 
of complexity: homogeneous (class I), periodic (class II), 
chaotic (class III), and complex (class IV). This study 
mainly discusses a possibility to extend intelligence of each 
cell and presents a new perspective of CA for discussing re¬ 
lationship between information processing and pattern for¬ 
mation. 

If intelligence can be described as the ability to perceive 
information and use it to form adaptive behaviors within an 
environment, each cell of CA seems not intelligent because 


it does not incorporate a process for organizing the informa¬ 
tion to determine its own future state. It is rather important 
that even such a simple model can simulate complex pat¬ 
terns reminding biological activities. In contrast, in a flock¬ 
ing boids simulation, which is a typical model of multi-agent 
system developed by Reynolds (Reynolds, 1987, and Banks, 
Vincent, and Anyakoha, 2007), each boid obtains the motion 
information of other boids within its perception area and us¬ 
ing a simple algorithm, alters its own motion according to 
an analysis of this information. Boids easily organize them¬ 
selves into a large, orderly group and move as a single or¬ 
ganism without a central commander, i.e., their information 
processing leads to a collective control of the group motion. 
There seem to be a crucial difference between the two mod¬ 
els, CA and the boids. 

Actually, a framework inspired by that of boids, named 
the Recursive Estimation of Neighbors (REN), was intro¬ 
duced into CA to construct a model for studying the relation¬ 
ship between information processing and pattern formation 
in collective systems at the 21st AROB international con¬ 
ference (Kayama, 2016). In the present study, this idea is 
clarified and the formations of some interesting patterns in 
the studied CAs are investigated. It can be seen that the new 
model does not surpass the framework of CA but its recon¬ 
struction or reinterpretation. A basic CA rule with a unit rule 
radius is extrapolated into rules with larger radii through the 
REN algorithm. The extrapolated rules form a sequence in¬ 
dexed by an extra radius that represents the size of the per¬ 
ception area of a cell. The sequence contains the basic CA 
as its first term. When we call a CA model comprising cells 
with different values of the extra radius as heterogeneous , 
some models show interesting pattern formations. Contrast¬ 
ing examples in two-dimensional CA are discussed through 
the mean field analysis. 

The next section shows how the intelligent process of each 
cell can be implemented using the REN algorithm, through 
the introduction of the extra radius in addition to the radius 
of the basic CA neighborhood. In Section 3, some typical 
sequences of the extrapolated CA rules in one-dimensional 
elementary CA (ECA) and two-dimensional eight-neighbor 
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outer-totalistic CA including Conway’s Game of Life (Life¬ 
like CA) are presented. Sequences of EC A #22 and #110 
modify their complexity between periodic and chaotic pat¬ 
terns depending on the even-odd parity of their extra index. 
In the extrapolated Game of Life sequence, there is a positive 
correlation between the extra radius and the average con¬ 
vergence speed to a rest state; however, in another sample 
sequence, there is a negative correlation. Such contrasting 
activities are interpreted by applying the mean field analysis 
in Section 4. Some interesting patterns are formed in het¬ 
erogeneous models; their lattices are composed of cells that 
follow different extrapolated rules over a given sequence. 
Some geometrical arrangements of such different types of 
cells form composite models, which provide new applica¬ 
tion possibilities for CA. 

Recursive Estimation of Neighbors 

Practice is an activity to improve skills. When acquiring a 
skill, we initially try to recognize it as a set or some sequence 
of small actions. After repeating practice, such a process be¬ 
comes unconscious, and the skill can be used just as a reflex 
action, in which no intelligent activities seem to be involved. 
An experienced person can deal with a lot of information al¬ 
most automatically, whereas a beginner is likely to be at a 
standstill in front of it. If a CA rule is considered similar to 
such psychological insight, it might be possible to represent 
the rule by a set or some sequence of processes. 

In case of boids, each boid acquires information regard¬ 
ing the positions and velocities of boids within its perception 
area and determines its own movement in order to follow 
the representative values of the neighbors. The radius of the 
perception area can be treated as a parameter expressing the 
differences between individual elements. In order to incor¬ 
porate a similar scenario in CA, the perception area of a cell 
should be separated from the neighborhood determined by 
the CA rule, so that the size of the area can be treated as an 
attribute of each cell. Under the CA framework, however, 
the neighborhood of each cell is defined by the CA rule, and 
there is no possibility of expanding the sensory area of a 
cell. For example, each cell of ECA acquires the states of 
the three cells within its radius-one neighborhood to deter¬ 
mine its state in the next timestep. The above psychological 
discussion suggests that such separation can be possible if 
the update process of each cell has an intermediate process 
of estimation of next states of neighboring cells as follows: 

Acquire information of neighbors =4> estimate their next 
states => determine its own next state 

Estimation and determination of states are assumed to be 
processed by only a basic CA rule because if other rules or 
mechanisms were introduced, the present framework would 
become complicated and difficult to find a reasonable selec¬ 
tion method. Moreover, here we assume “self-similarity,” 
which means that all cells use the same update algorithm. 
Then, the basic CA rule is expected to be used recursively. 


Following the above discussion, the target framework in¬ 
cludes the perception area for each cell in addition to the 
basic CA neighborhood. The states of all cells within the 
area are perceived in each timestep. Here we assume that 
the basic CA neighborhood and the perception area are both 
isotropic and can be parametrized by their radii r and R: 

r: radius of the basic CA neighborhood shared by all cells. 

R: radius depending on the size of the perception area of 
each cell. 


The intelligent process of each cell is implemented by the 
REN algorithm, in which the basic CA rule is recursively 
used to estimate the states of the neighboring cells. The per¬ 
ceptual information of the states of cells inside the percep¬ 
tion area of a cell should be consumed in an intelligent man¬ 
ner in the update process of the cell. The recursive usage of 
the basic CA rule continues until the state of the target cell 
is subsequently determined. Note that in the estimation pro¬ 
cess the values of the extra radius R of neighboring cells are 
assumed by the target cell and the assumed values are not 
necessarily identical with their actual ones. 

The REN algorithm is defined as follows: 

(i) In the estimation process, it is assumed that the same al¬ 
gorithm of recursive estimation is used by all neighboring 
cells (self-similarity). Within the perception area, each 
neighboring cell is assumed to have a perception area that 
is as large as possible. 

(ii) In cases where the assumed perception area of a cell is 
smaller than the neighborhood of the basic CA rule (R < 
r), the cell in the next timestep is assumed to remain equal 
to the current state (termination condition). 


Demonstrating the implementation in ECA, which is the 
simplest one-dimensional binary CA with r = 1, will help 
clarifying the REN concept. Here we suppose that an ECA 
model is homogeneous , which means that all cells have the 
same value of R. 

The state of the i-th cell at timestep t and the ECA rule 
function are denoted by and /, respectively. The stan¬ 
dard time evolution of the state is expressed by 


r (*+l) _ (t) Jt) X 


( 1 ) 


The new framework requires that the states of all neighbor¬ 
ing cells at t + 1 are estimated by the ECA rule. Then the 
above expression changes to 


(t+l) _ f, m u+i; \ 




W ,„(*+!) 


( 2 ) 


where is an estimated state of the i-th cell at t + 1 

with radius R = R$, and i±i are estimated states 

of the adjacent neighbors at t + 1 with an assumed radius 
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Ro — 1; the value of the neighbors’ radius stems from the 
above definition (i) because Ro — 1 is the maximum value 
of the perception area for the neighbors within the percep¬ 
tion area of the i-th cell with radius Ro. Note that 
is assigned to the actual state xf +1 \ but i±1 are not 

necessarily equal to their respective actual states x be¬ 
cause the neighbors’ true radius is not Ro — 1, but Ro. Sub¬ 
sequently, the definition (i) leads to the following recursive 
expressions of the estimated states of the neighbors: 


09 (t+1) - ffG9 (t+1) T KZ) i Hi 

J t+1 ) = t+1 ^ ) (4) 

rR 0 -j,i+j J Vr\R 0 —j —1, 2 +j-l’ r'RQ—j — l'i+j+ljA ) 

where j = 1, 2,..., R 0 — 1. Because j = Ro implies that 
the estimated value of R equals 0 (< r = 1), the definition 
(ii) gives the following conditions: 


Jt+i) 


Vo,i±R 0 


x- 


(t) 

±Rq’ 


(t+1) 

^0,z±it:o+2 


. r w 

x i±RoT2i 


(5) 


which terminate the above recursiveness. 

As the first step of a concrete demonstration, let us con¬ 
sider the case Ro = r = 1. Equations 2 and 5 give 

= and = -'’A- Ac - 

cordingly, the ECA model with R 0 = 1 is identical to the 
basic ECA (Eq. 1). This discussion is not restricted to ECA; 
all CA models with R 0 = r are identical to their basic 
CA. We next discuss the case Ro = 2. Equation 2 gives 
xf +1 ) = and the recursive expres¬ 

sions 3 and 4 give 


Sequence of Extrapolated CA rules 

When a basic CA is assigned to a code AT, a sequence of 
extrapolated rules from the basic CA with index R is repre¬ 
sented by [AT]. If each CA included in the sequence should 
be identified, its code is shown as the basic CA code fol¬ 
lowed by the letter “R” and its value n. The sequence is rep¬ 
resented by [A/’]={A/’R1, AfR2, AfR3, ...}, where AfRl is 
identical with the basic CA. Each cell that belongs to AfRn 
is called an “AfRn cell.” 

The implementation of REN discussed in the previous 
section can be applied to Life-like CA. A pattern that does 
not change from one generation to the next is known as a 
“still life” in the Game of Life and other CA. The following 
property of the sequence of extrapolated CAs is proved: 

• A still life in AfRl is also a still life in any CA included 
in the sequence [AT], when it is sufficiently isolated. 

“Sufficiently isolated” means that any perception area of 
cells that form the located still life contains no active cells 
other than the member of the still life. To prove this property, 
we adopt the notation of the previous section. The general¬ 
ization to other cases, e.g., Life-like CA, is straightforward. 

(1) A pattern SL in AfRl is assumed to be a still life, which 
means that for any cell i in SL , 

x\ t+1) = *= /(4-i^fAi+i) 

r w 

(2) In AfRl, any state X{ follows 

4 +1) = 


J t+1 ) 

++,i-l 

— /v+0,i-25 +0,Z b 


From (1) and the 

assumption “sufficiently isolated,’ 

(6) 

x i±i^ = Vit± i = x i± i are a i so satisfied in SL. Then, 


— JvPo,i i x i+ l’ ( bo,i+ 2 b 

(7) 

Jt+i) 

X i 


AT’ = 

and ipo+± 2 = x i± 2 fr° m the termi- 


_ r (*) 

— x i ■ 


nation conditions (Eqs. 5), the CA model with Ro = 2 is 
expressed by 

(t+l) _ Jt) (t)x Jt) ,, Jt) it) it) u 

( 8 ) 

which corresponds to a five-neighbor CA rule, x^ +1 ^ = 

9( x i-2i x i-n x i+n x l+ 2 )- Equation 8 is an “intelli¬ 
gent” expression of the rule g. The cases for larger values 
of R can be derived similarly. 

The above discussion indicates that the REN algorithm 
with increasing sizes of perception areas indexed by R ex¬ 
trapolates the basic CA rule to rules with larger radius r = 
2 x R + 1. The extrapolated rules from the basic CA form a 
sequence parametrized by R. Namely, a CA rule included 
in the sequence can be reconstructed from the basic CA 
through REN, where each cell has its own perception area 
and acts as an intelligent agent. 


( 3 ) If SL is assumed to be a still life in AfRn, then 

m = *i*>. 

( 4 ) In AfR(n + 1), any state xi follows 

4 m) = A+fi = 

From (3) and the assumption “sufficiently isolated,” 

x i±i^ = ^n,l±i = X t± i are a ^ S0 satisfied i n SL. Then 
from (1), 

r (W) _ f(Jt) Jt) Jt) \ 

— Jt) 

— x i . 

Although pattern formations in sequences of extrapolated 
ECA and Life-like CA were already demonstrated in 
Kayama, 2016, some typical examples are presented in the 
following subsections. 
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Figure 1: Patterns in [#134]: Rl, R2, R3, R4 (left to right). 



Figure 4: Patterns in [#22]: Rl, R2, R3, R4 (left to right). 




Figure 2: Patterns in [#30]: Rl, R2, R3, R4 (left to right). Figure 5: Patterns in [#110]: Rl, R2, R3, R4 (left to right). 


Extrapolated ECA 

EC A is the simplest nontrivial CA with r = 1; its 2 3 = 8 
different neighborhood configurations result in 2 8 = 256 
possible rules. We follow the standard naming convention 
invented by Wolfram (Wolfram, 1983, 2002), which assigns 
each ECA rule a number from #0 to #255. The equivalency 
of the CA rules under mirror and complementary transfor¬ 
mations reduces the number of independent rules to 88 (Li 
and Packard, 1990, and Kayama, 2011). In the simulations 
used in this subsection, we set the maximum value of R to 
20. Among the sequences generated from independent ECA 
rules, eight are based on class I rules and all of the rules 
contained in these sequences also belong to class I. In con¬ 
trast, various pattern formations can be found in sequences 
based on class II rules. The sequence [#134] shows changes 
between periodic and chaotic patterns depending on the in¬ 
dex R (Fig. 1), where the colored dots are live cells and the 
black ones are dead. The patterns originate from pseudo- 
randomly generated initial configurations. The initial prob¬ 
ability of live cells is set to 0.5. Pattern formations in se¬ 
quences based on class III and IV rules are also attractive. 
Class III rules are sometimes exemplified by rule #30, and 
its sequence [#30] exhibits chaotic patterns (Fig. 2). Es¬ 
pecially, the patterns with large R values (#30R3 and R4) 
are typical ones. For example, the pattern of #90R2 in Fig. 
3a cannot be distinguished from them. But the pattern of 
rule #18 is sparser (Fig. 3b). Some sequences show periodic 
changes between periodic and chaotic patterns depending on 
the even-odd parity of R , e.g., [#22] (Fig. 4) and [#110] (Fig. 
5). In these cases, no simple correlations are found between 
fluctuation control and the radius R , even when the amount 



(a) Left: #90R1, right: #90R2 (b) Left: #18R1, right: #18R2 

Figure 3: Patterns in (a) [#90] and (b) [#18]. 


of information each cell acquires increases monotonically 
with R. 

Extrapolated Life-like CA 

In the descriptions below, all Life-like CA rules are speci¬ 
fied in the Golly/RLE format (Adamatzky (Ed.), 2010, and 
Eppstein, 2010). The Game of Life is denoted by B3S23 in 
this notation, where “B” stands for “birth” and “S” stands for 
“survival.” In the Game of Life, many complex patterns and 
activities can emerge (Callahan, 1995, and Flammenkamp, 
1998). After a long transient process, a randomly generated 
initial configuration is transferred to a rest state that can in¬ 
clude various patterns: still lifes (e.g., blocks, beehives, or 
ships) and oscillators (e.g., blinkers, toads, or beacons). Any 
isolated still life is also still life in any extrapolated rule, as 
proved above. Figure 6a shows the rest states of [B3S23]. In 
contrast, [B23S234] has a stable state only in B23S234R1 
and random states in all others (Fig. 6b). Their randomness 
gradually increases with R. These contrasting examples are 
investigated in the mean field analysis in the next section. 

Mixing Cells with Different R 

Thus far, only homogeneous CAs were discussed, in which 
all the cells follow the same rule with the same value of R. If 
we note that the extra radius R is an index of the amount of 



(a) [B3S23]: Rl, R2, R3, R4 (left to right). 



(b) [B23S234]: Rl, R2, R3, R4 (left to right). 

Figure 6: Patterns in homogeneous (a) [B3S23] and (b) 
[B23S234] 


95 
































(a) B3S23R1 




(a) B3S23R1=>R2=>R3=>R4 



(b) B23S234R1=>R2=>R3=>R4 


Figure 7: Mixings of cells with different R in (a) [B3S23] 
and (b) [B23S234] 



Figure 8: Mixing of cells with different R in [B4S1234]: 
R1=>R2=>R3=>R4. 


information each cell can acquire, R can be recognized as a 
characteristics of a cell and heterogeneous CA composed of 
cells with different values of R become meaningful. Figures 
7a and 7b present the mixing of cells with different values 
of R in [B3S23] and [B23S234], respectively. The mixing 
ratio between two kinds of cells, Rn:R(n + 1), changes lin¬ 
early from 1:0 to 0:1. Although homogeneous B3S23R1 and 
R2 have rest states as shown in Fig. 6a, Fig. 7a shows an 
emergence of unstable states in their intermediate mixing 
area. In contrast, Fig. 7b shows that all mixing areas of 
[B23S234] become stable. The difference between them is 
also discussed in the next section. 

A heterogeneous CA in [B4S1234] is also interesting. 
Figure 8 shows a complex pattern change; areas of islands 
and random states are sandwiched between walls. Their 
mixing ratio is the same as in the case of Fig. 7, but the 
initial probability of live cells is set to 0.2. 


Figure 9: CobWeb plots of [B3S23] 

section. 

Homogeneous CA 

A Life-like CA rule is determined by the numbers of “births” 
and “survivals.” Here they are symbolized by bth and svl , re¬ 
spectively. The time evolution relationship (Eq. 2) is rewrit¬ 
ten in the mean field analysis as follows: 

p'r 0 = (1 - PRo)B(p'eRo-ubth) + p Ro S(p' eRo -i,svl), 

(9) 

where p and p' are the densities of the live cells at the present 
and at the next timestep, respectively. p' e R 0 -1 is the esti¬ 
mated density of live cells relating to The functions 

B and S refer to the contributions from the eight neighbor¬ 
ing cells according to the rule. The recursive expressions, 
Eqs. 3 and 4, lead to 

PeRo-j = (l-PRo) B (p'eRo-j-l-i bt t l )+PRoS(p'eRo-j-li sv l) 

(10) 

where j = 1, 2,..., R 0 — 1. Because the termination con¬ 
ditions (Eqs. 5) give p ' e o = pR 0 , the above Eq. 10 leads 
to 

p'e 1 = (1 “ PR 0 )B{PR 0 ,bth ) + PR 0 S(pR 0 , svl). (11) 

A recursive use of Eq. 10 and Eq. 11 in Eq. 9 derive a 
relational expression between p' r 0 and pR 0 . In B3S23, the 
Game of Life , the functions B and S are expressed by 


Mean Field Analysis 

Mean field analysis assumes that the iterative application of 
a rule does not introduce correlations between states of cells 
in different positions when applied to CA (Wolfram, 1983, 
Schulman and Seiden, 1978). This assumption is generally 
not valid, but allows the derivation of a simple formula for 
investigating the qualitative behavior of CA dynamics and 
an estimation of the limit density of the possible states of 
a cell. Mean field analysis is applied to the contrasting ex¬ 
amples of [B3S23] and [B23S234] presented in the previous 


B(p,S) = (^)/> 3 (l “P) 5 , 

S(P, 23) = %3)+^ 2 (l-p) 6 . 

Mean field diagrams and cobweb plots of B3S23R1 and R4 
are presented in Fig. 9, where the number of timesteps for 
reaching the limit density from the initial density 0.5 de¬ 
creases with the value of R. This result means that there can 
be a positive correlation between the average convergence 
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Figure 10: Semi-log plots of transient times in [B3S23]. 
Each error bar indicates the standard deviation. 




(a) B23S234R1 (b) B23S234R4 

Figure 11: CobWeb plots of [B23S234] 

speed and the value of R , namely the size of the perception 
area. Actually, Fig. 10 shows that the average transient time 
from fifty pseudo-randomly generated initial configurations 
on a 200 x 160 lattice to rest states decreases 1 . In contrast, 
the mean field diagrams and the cobweb plots of [B23S234] 
become complex with the value of R. Accordingly, the cell 
states become unstable (Fig. 7b). In other words, the REN 
algorithm in [B3S23] consumes information of cell states ef¬ 
ficiently to control the fluctuations of the states, but that in 
[B23S234] disturbs the cell states. 

Heterogeneous CA 

As pointed out in Section 3.3, the pattern formations in the 
heterogeneous CAs in [B3S23] and [B23S234] are contrast¬ 
ing. In order to discuss them through mean field analysis, 
some geometrical symmetry is required to the arrangement 
of cells with different R. 

The definition (i) of REN results in a specific composite 
configuration of cells. If an R2 cell is surrounded by eight 
R1 cells (Fig. 12a), the estimation of the states of the neigh¬ 
boring cells from the center cell is always correct, which 
means that the nine cells can be considered as one compos¬ 
ite with 2 9 states. In the symmetrical arrangement of Fig. 
12b, all R1 cells have seven R1 and one R2 neighbors, and 
all R2 cells have eight R1 neighbors. Then these two kinds 
of cells can be represented by two densities of cells pi and 

! The long transient time at R — 6 is the only exception owing 
to a glider (Kayama, 2016). 



(a) (b) R1:R2=8:1 (c)Rl:R2=4:l 

Figure 12: (a) Composite configuration comprising one cen¬ 
ter cell (Rm 2; blue) and eight neighboring cells (R = 1; 
green), and its symmetrical arrangements; the ratios of the 
numbers of R1 and R2 cells are (b) 8:1 and (c) 4:1. 

p 2 in the mean field analysis, respectively, pi satisfies the 
following expression: 

f> i = (1- pi)B(p 1 ,p 2 ,bth) + p 1 S(p 1 ,p 2 ,svl),(l2) 

where the termination conditions are taken into account. B 
and S are expressed in [B3S23] as follows: 

-B(pi,p2,3) = (^jp\{l - pi) 5 p2 

+ (^Pi(l - Pi) 4 (l - P2), 

S(pi,P2, 23) = B(p 1 ,p 2 ,3) + ^jpi(l - Pi) 6 P2 

+ ^2^i( 1 - / 9 i) 5 ( 1 - P%)- 
The expression of p 2 comes from Eq. 9 as follows: 

p' 2 = (1 - P 2 )B(p e i,bth) + p 2 S(p e i,svl), (13) 

where the density of the estimated state p e i is identical with 
p' i, because the estimation of the states of the neighboring 
R1 cells from the R2 cell is always correct. The numerical 
results of iterations of the above expressions 12 and 13 in 
[B3S23] lead to Fig. 13, which shows that the symmetrical 
arrangement of Fig. 12b has no essential differences with 
the homogeneous B3S23R2. In contrast, the plots of the nu¬ 
merical results of the above expressions of the arrangement 
of B23S234R1 and R2 cells and homogeneous B23S234R2 
are totally different. Correlation between R1 and R2 cells 
suppresses their fluctuation mutually and their densities ap¬ 
proach a limit value. These results correspond to pattern for¬ 
mations in the symmetrical arrangements of B3S23R1 and 
R2 cells and B23S234R1 and R2 cells. The former reaches 
a rest state and the latter a stable one. 

In order to understand the emergence of unstable states in 
the mixing area between B3S23R1 and R2 in Fig. 7a, we 
adopt another symmetrical arrangement of Fig. 12c. Ac¬ 
tually, the pattern formation in this arrangement is unstable 
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where the above functions, B s and S s, are presented in Table 
1. The plot of the numerical results is shown in Fig. 15b. 
The existence of a gap between p a and p & represents the 
existence of the unstable area. 

From the above discussion, mean field analysis can be ef¬ 
fective to investigate the qualitative behavior of heteroge¬ 
neous CA, if some symmetrical arrangements of the com¬ 
posites are adopted. 


Figure 13: Plots of iterations in (a) homogeneous B3S23R2 
and (b) composite of R1 :R2=8:1 


+.■* 
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U 

(a) 




(b) 


Figure 14: Plots of iterations in (a) homogeneous 
B23S234R2 and (b) arrangement of B23S234R1:R2=8:1 


(Fig. 15a). The arrangement requires two kinds of R1 cells 
and one R2 cell. If the two kinds of R1 cells are represented 
by R a and R6, they are distinguished by their neighbors; Ra 
cell has six R b and two R2 neighbors, and R b cell has three 
Ra, two R b and three R2 neighbors. R2 cell has two Ra and 
six R b neighbors. If the densities of the three types of cells 
are denoted by p a , Pb, and p 2 , they satisfy the following ex¬ 
pressions: 

p'a = (1 - p a )B a (p h ,p 2 ,3) +Pa5'a(/>6,P2,23), (14) 
P b = (1 Pb)Bb(pa-) Pbi P2t 3) ~\~ Pb^b^Pa't Pbi P2-> 23), 

(15) 

p'2 = (1 ~ p2)B 2 (pa,Pb,?>) +p 2 5'2(Pa,P6,23), (16) 



Figure 15: (a) Unstable pattern in the symmetrical arrange¬ 
ment (Fig. 12c) of B3S23R1:R2=4:1 composites and (b) its 
plot of iterations. 


Conclusions 

The REN algorithm allows for defining the intelligent pro¬ 
cess of each cell by introducing an extra index R that rep¬ 
resents the radius of the perception area of a cell in addi¬ 
tion to the radius of the CA neighborhood. A basic CA 
rule with a unit rule radius is extrapolated into rules with 
larger radii r = 2 x R + 1, which form a sequence indexed 
by R containing the basic CA as the first term of the se¬ 
quence. Pattern formations in some typical sequences of the 
extrapolated ECA and Life-like CA rules are presented. It 
is proven that a still life in the basic CA is also a still life 
in any extrapolated rule over the same sequence when the 
still life is sufficiently isolated. The sequence of the Game 
of Life ([B3S23]) and [B23S234] exhibit the contrasting ac¬ 
tivities of cell states, which are discussed through mean field 
analysis. Those mean field diagrams show opposite effects 
of their extrapolations in accordance with R. Namely, the 
REN algorithm in [B3S23] consumes information about the 
cell states efficiently in order to control the fluctuations of 
the states, but that in [B23S234] disturbs the cell states. The 
pattern activities of the heterogeneous models of mixing R1 
and R2 cells in [B3S23] and [B23S234] are also contrasting. 
Some symmetrical arrangements of the composites contain¬ 
ing R2 cells surrounded by eight R1 cells are examined to 
discuss the models in the mean field analysis. The unstable 
area between B3S23R1 and R2 may be emerged from the 
gap between the limit densities of two kinds of R1 cells. The 
stable area between B23S234R1 and R2 comes from mutu¬ 
ally suppressing their fluctuations. Correspondingly, mean 
field analysis of their densities shows that they approach to 
the same limit value. Furthermore, mean field analysis is 
effective in investigating qualitative behavior of heteroge¬ 
neous CA, not just homogeneous CA. 

The new perspective of CA presented here has several 
different potential applications. Heterogeneous CAs with 
combinations of cells with different values of R show un¬ 
expected pattern activities. Such phenomena appear to act 
like mixing two different materials; the end material has a 
changed state by the chemical process. The symmetrical ar¬ 
rangements of composites could be related to crystal sub¬ 
stances in solid-state physics. If interactions between cells 
and an evolutionary algorithm are introduced in the hetero¬ 
geneous CA, a new theoretical field, like “CA Chemistry,” 
could be established just as the boids theory was developed 
to “Swarm Chemistry” (Sayama, 2007, 2009, 2010). 
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Table 1: Contribution functions B s and S s in mean field 
analysis for the symmetrical arrangement of B3S23R1 and 
R2 cells (Eqs. 14-16). 


Flammenkamp, A. (1998). Achim’s game of life, 
"http://wwwhomes.uni-bielefeld.de/ 
achim/gol. html". Retrieved at December 12, 
2011 . 

Gardner, M. (1970). Mathematical games. Scientific Ameri¬ 
can, 223:102-123. 

Kayama, Y. (2011). Network representation of cellular au¬ 
tomata. In 2011 IEEE Symposium on Artificial Life 
(IEEE ALIFE 2011) at SSCI2011, pages 194-202. 

Kayama, Y. (2016). Extension of cellular automata by intro¬ 
ducing an algorithm of recursive estimation of neigh¬ 
bors. In Proceedings of the 21-st International Sympo¬ 
sium on Artificial Life and Robotics, pages 73-77. 

Li, W. and Packard, N. (1990). The structure of the elemen¬ 
tary cellular automata rule space. Complex Systems, 
4:281-297. 

Reynolds, C. W. (1987). Flocks, herds and schools: A dis¬ 
tributed behavioral model. ACM Siggraph Computer 
Graphics, 21.4:25-34. 

Sayama, H. (2007). Decentralized control and inter¬ 
active design methods for large-scale heterogeneous 
self-organizing swarms. Advances in Artificial Life, 
15(1): 105—114. 

Sayama, H. (2009). Swarm chemistry. Artificial Life, 
15.1:105-114. 


Acknowledgments 

The author wishes to thank Y. Imamura and the all anony¬ 
mous reviewers for valuable comments and suggestions. 

References 

Adamatzky, A., editor (2010). Game of life cellular au¬ 
tomata. London:: Springer. 

Banks, A., Vincent, J., and Anyakoha, C. (2007). A review 
of particle swarm optimization, part i: background and 
development. Natural Computing, 6.4:467-484. 

Berlekamp, E. R., Conway, J. H., and Guy, R. K. (1982). 
Winning Ways for Your Mathematical Plays. Academic, 
New York. 

Callahan, P. (1995). Patterns, programs, and links for con- 
way’s game of life, "http : //www. radicaleye . 
com/lifepage/". Retrieved at February 1, 2011. 

Eppstein, D. (2010). Growth and decay in life-like cellu¬ 
lar automata. In Adamatzky, A., editor, Game of Life 
Cellular Automata, pages 71-98. Springer. 


Sayama, H. (2010). Robust morphogenesis of robotic 
swarms. Computational Intelligence Magazine, IEEE, 
5(3):43-49. 

Schulman, L. S. and Seiden, P. E. (1978). Statistical me¬ 
chanics of a dynamical system based on conway’s 
game of life. Journal of Statistical Physics, 19(3):293- 
314. 

von Neumann, J. (1966). The theory of self-reproducing 
automata. In Burks, A. W., editor, Essays on Cellular 
Automata. University of Illinois Press. 

Wolfram, S. (1983). Statistical mechanics of cellular au¬ 
tomata. Rev. Mod. Phys., 55:601-644. 

Wolfram, S. (1984). Universality and complexity in cellular 
automata. Physica D, 10:1-35. 

Wolfram, S. (1986). Theory and Applications of Cellular 
Automata. World Scientific, Singapore. 

Wolfram, S. (2002). A New Kind of Science. Wolfram Me¬ 
dia, Inc. 


99 




Ant Geometers 


Sean Luke, Katherine Russell, and Bryan Hoyle 

George Mason University 
sean@cs.gmu.edu 


Abstract 

Just how much can a pheromone-enabled swarm do? Moti¬ 
vated by robotic construction, we set out to show that a swarm 
of computationally simple ants, communicating only via 
pheromones, can in fact perform classic compass-straightedge 
geometry, and thus can make many shapes and perform many 
nontrivial geometric tasks. The ants do not need specially - 
designed stigmergic building materials, a prepared environ¬ 
ment, local or global direct communication facilities (such as 
radio or line-of-sight signaling), or any localization beyond 
initial starting points for drawing. We describe the proof of 
concept in replicable detail. We then note that its accuracy and 
efficiency can be greatly improved through augmentation with 
a simple embeddable broadcast mechanism. 

Introduction 

One of the biggest difficulties in swarm robotics, and in 
swarm agent simulation in general, lies in how to commu¬ 
nicate and coordinate. Due to their large numbers, swarm 
agents often cannot communicate through a common broad¬ 
cast medium such as radio, both because they would over¬ 
whelm the medium, and because it would require every agent 
to receive and deal with messages from all N other agents. 
Instead artificial swarm schemes often use either local com¬ 
munication or indirect communication, whereby agents leave 
messages for one another — virtual breadcrumbs, if you will. 

The biggest source of inspiration for indirect communi¬ 
cation in robotics is surely the use of pheromones by ants, 
termites, etc. to coordinate behaviors. These insects are 
known to use pheromones for many tasks: but most swarm 
robotics and swarm simulation literature has focused on their 
most famous use, namely establishing foraging trails. 

In prior work we have demonstrated trail optimiza¬ 
tion, adaptation to environmental changes, and even multi¬ 
waypoint, self-intersecting tours using only pheromones and 
swarms of very simple agents (Panait and Luke, 2004). Our 
later work extended the pheromone model to swarm foraging 
robots which store and read pheromone information in intel¬ 
ligent breadcrumbs (in the form of wireless sensor motes), 
and then deploy, move, and retrieve these devices from the 
environment (Hrolenok et al., 2010; Russell et al., 2015). 


The use of pheromones to build foraging trails is straight¬ 
forward and well studied. We are instead interested in show¬ 
ing that pheromones can be used for something much more 
ambitious. Our research area is collective building construc¬ 
tion, and among the first tasks in construction is the laying 
out of survey lines to define the form of the object being built. 
As such, we have chosen to demonstrate, as an elaborate 
proof of concept, that swarms of very simple agents, commu¬ 
nicating only via pheromones, can achieve all the operations 
necessary to perform collective compass-straightedge con¬ 
struction (or “classical construction”) from geometry, and 
thus can build many nontrivial geometric shapes. 

This is not easy: some of the basic operations are challeng¬ 
ing to achieve and, while we show that such operations are 
possible, they can be costly. We will compare against agents 
that differ only in their communications medium (broadcast 
beacons), but nevertheless can do the task much more rapidly. 

In this paper we first discuss existing literature in 
pheromones, swarm robotics, and collaborative construction. 
We then explain classic compass-straightedge geometric con¬ 
struction and its background. We then detail the pheromone 
and swarm agent model being used, and describe the basic 
procedures necessary to do compass-straightedge construc¬ 
tion. Finally, we compare this approach against similar agents 
using broadcast beacons instead of pheromones. 

Previous Work 

Swarm robotics and swarm agent research is naturally in¬ 
spired by social insects (Brambilla et al., 2013) and stigmer¬ 
gic approaches to collective behavior Dorigo et al. (2000). 
Swarms are highly parallel, can be built with simple agents, 
and are robust in the face of noise, agent failure, and dynamic 
environments (Bonabeau, 1996). Swarms are commonly 
used in tasks such as exploration and foraging (Wodrich and 
Bilchev, 1997; Panait and Luke, 2004; Russell et al., 2015; 
Prabhakar et al., 2012), but one recent application has been 
in collective construction (Ardiny et al., 2015). Swarms are 
typically limited to indirect, stigmergic, and local communi¬ 
cation, which leads to one of the three following implementa¬ 
tion trends: 
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First: agents may use inert, local, environmental features, 
which can be sensed but do not directly communicate with 
the agents or with each other. Landmarks or the presence of 
other agents are common features which can be used to clear 
ground for site preparation (Parker and Zhang, 2006) and 
build circles around given locations (Pitonakova and Bullock, 
2013). Agents using these techniques do not need accurate 
localization and can use a variety of building materials to 
perform their tasks. However, the agents cannot easily do 
planning or coordination as they neither know where they are 
nor what has been accomplished so far. 

Second: agents may exist in or create “smart” environ¬ 
ments which can be used to localize the agents, such as 
writable blocks (Allwright et al., 2014; Werfel and Nagpal, 
2008; Sugawara and Doi, 2014) or countable building materi¬ 
als (Werfel et al., 2011). This allows grid-world models to be 
directly implemented with robots and has produced swarms 
capable of making 3D user-defined shapes. The use of such 
environments allows agents to be fully localized relative to 
the structure they are building, but they must use highly spe¬ 
cialized building materials or contrived environments. 

Third: agents may lay temporary stigmergic markers 
in the environment, such as breadcrumbs or pheromones 
(Deneubourg et al., 1990; Russell, 1999; Panait and Luke, 
2004; Chibaya and Bangay, 2007). This technique has pro¬ 
duced behaviors such as circle building (Pitonakova and Bul¬ 
lock, 2013), exploration (Wodrich and Bilchev, 1997), and 
wall building (Stewart and Russell, 2006). One difficulty 
with the method lies in the medium in which these stigmer¬ 
gic markers are placed. In prior work we have attempted 
to address this with portable local beacons (Russell et al., 
2015) but other methods, such as RFID tags (Mamei and 
Zambonelli, 2007; Ziparo et al., 2007), lights (Stewart and 
Russell, 2006), and chemical dispersion (Kowadlo and Rus¬ 
sell, 2008) have also been tried. 

Our work in this paper is most comparable to that of “smart” 
environments, such as in Werfel et al. (2011), as it allows 
the collective construction of a very large number of possible 
structures: but instead of using specially-made stigmergic 
construction materials, we explore whether such tasks may 
be performed solely through a general-purpose indirect com¬ 
munication method such as pheromones. 

Compass-Straightedge Construction 

The compass-straightedge technique has existed since the an¬ 
cient Greeks and has a long history of constructive proofs to 
build complex shapes and do many nontrivial geometric tasks. 
Traditionally the only two tools permitted are a collapsing 
compass and an arbitrarily long straightedge. The compass 
loses its angle as soon as it is lifted off the drawing surface, 
and so one cannot preserve distance by raising the compass 
and moving it somewhere else. However, this is an artificial 
limitation, as there is a way to transfer distance between two 
points using a finite number of axiomatic steps (Sarhangi, 




Figure 1: The five basic compass-straightedge geometry pro¬ 
cedures. (A) Drawing a line between two points. (B) Draw¬ 
ing a circle centered at one point and passing through an¬ 
other. (C) Identifying the point at the intersection of two 
lines. (D) Identifying and distinguishing the two points at 
the intersection of two circles (one point if they are tangent). 
(E) Identifying and distinguishing the two points at the inter¬ 
section of a circle and a line (one point if they are tangent). 

2007). It has also been shown that the straightedge is not 
required at all, only a compass, (Mohr, 1672; Mascheroni, 
1797) but the resulting proofs can be much more complicated. 

Figure 1 shows the five basic abstract procedures sufficient 
to do all compass-straightedge construction: drawing a line 
through two points, drawing a circle centered at one point and 
passing through another, identifying the intersection of two 
lines, identifying the intersections of a line and a circle, and 
identifying the intersections of two circles. Note that for the 
last two procedures, not only must one identify the points, but 
one must also uniquely identify and distinguish them from 
one another. This problem does not arise for human proofs, 
which are visual: but ants have only local information and so 
must include ways to distinguish the points. 

Composing these simple techniques, one can build much 
more complicated constructions. One simple example for 
constructing an equilateral triangle is as follows: start with 
a line segment representing the desired base; use a compass 
to construct two circles centered at the endpoints with radius 
equal to the segment length; and, finally, draw two more 
line segments connecting the desired intersection point to the 
two endpoints of the base. Other basic things which can be 
constructed include: bisecting arbitrary angles with a line; 
constructing a square with twice the area of another square; 
constructing a circle tangent to another circle; trisecting an 
arbitrary line segment; and building any regular polygon 
whose number of edges is equal to some power of 2 times 
the product of zero or more primes of the form 2 2 + 1, for 
some integer n\ and many, many more. Overall the first ten 
constructible polygons have 3, 4, 5, 6, 8, 10, 12, 15, 16, and, 
thanks to Gauss, 17 sides (Gauss, 1801). 

There are many things that cannot be constructed: any 
regular polygon not of the form above (such as the heptagon 
or the nonagon), trisections of arbitrary angles, squares with 
equal area to arbitrary circles (the legendary squaring the 
circle ), and many other classes of shapes. Several exten¬ 
sions have been proposed: for example, the ability to make 
marks on the straightedge permits angle trisection and the 
construction of additional regular polygons (Gleason, 1988). 
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A: 10% B: 1% C: 0.1% D: 0.01% 

Figure 2: Effect of Evaporation on Circle Development. Too 
high evaporation (A) produces incorrectly small and overly 
noisy circles, but too low evaporation (D) produces octagons. 

The Ant Model 

Out ants’ world is a non-toroidal, bounded 200x200 square 
grid environment holding 1000 ants. Any number of ants 
may share the same grid cell, and can move horizontally, ver¬ 
tically, or diagonally. Each grid cell can also hold multiple 
kinds of pheromones. The ants may move about in the envi¬ 
ronment, may read pheromone values in their local (9-cell) 
neighborhood, and may write or update pheromone values 
only to their current cell. As shown later, a grid world is not 
critical to the model, but was chosen for simplicity. 

Pheromones are used by the ants for various tasks: to mark 
points of interest, to establish gradients to and from those 
points of interest, to build up estimates of shapes, and to make 
final line drawings in the environment. Each pheromone has 
a pheromone type and a current numerical value > 0 which 
by default automatically reduces ( evaporates ) at a rate of 
0.5% per timestep. A pheromone in a cell can also be set to 
be non-evaporating. Non-evaporating pheromones are used 
to draw lines and circles and to mark points of interest, which 
in turn serve as permanent maxima in a pheromone gradient. 
We call cells holding non-evaporating pheromones sites. 

Because we ultimately will migrate this model to robots, 
the pheromones in the model do not diffuse on their own, but 
rather pheromone information can only be spread from one 
cell to another by an ant. This is because, while diffusion 
is used in biological pheromone models, it is not easy to 
employ with physical robots, as it requires chemical sensors 
and dispersion methods (Kowadlo and Russell, 2008), or cells 
or breadcrumbs which communicate with one another. 

Lacking diffusion, evaporation becomes very useful for 
a grid-world model as it causes the ants to naturally build 
gradients with more circular and less octagonal cross-sections. 
This results in straight paths at any angle, not just in the 
eight compass directions, and more circular circles (see for 
example Figure 2). Additionally, evaporation can be seen as 
a potential benefit in more interesting dynamic environments 
where old information should be treated skeptically. The 
disadvantage of evaporation is that it is a major source of 
noise: agents cannot rely on pheromone gradients being 
consistent at infrequently visited locations as neighboring 
cells may have been allowed to evaporate while the current 
cell has just recently been “topped off”. This significantly 
complicates our task and makes our procedures less efficient. 


The Ants Each ant is a simple machine which iteratively 
updates pheromone values in its cell, then either follows a 
procedure to perform some task, or (with 0.1 probability) 
moves randomly. Random movement encourages exploration 
and nondeterminism: some procedures may temporarily dis¬ 
able it so as to reduce noise. 

An ant can move in any of the eight compass directions. 
If an ant moves horizontally or vertically, it must wait one 
timestep before it may continue. If an ant moves diagonally, it 
must wait 1.5 steps (a discrete approximation of y/2, though 
neither has any real impact on the results over just using 1.0). 

An ant can also sense pheromones in its grid cell or any 
of the eight neighboring cells, and can determine if those 
pheromones are set to evaporate or not. An ant also knows 
where it was last timestep, and can perceive relative orienta¬ 
tion (“to the left of me”, etc.). An ant can set a temporary 
timer to know roughly how long it has been doing a task. 
Finally, an ant is capable of storing a single pheromone value 
in its sole register for later retrieval. 

Updating Pheromones Every timestep, the ant first up¬ 
dates all the pheromones at its cell location. If a pheromone 
P at the ant’s location i is marked as non-evaporating, the ant 
leaves it alone. Otherwise, it is updated as: 

{ a^Pj if i is a diagonal neighbor of i 

J 

aPj otherwise 

We set a to 0.88 based on experiment. As the ants wander 
about, this equation effectively builds a pheromone gradi¬ 
ent away the “tent-pole” locations in the grid (where a non¬ 
evaporating value for P has been set). Note that even though 
is used to (properly) cut down diagonal neighboring 
cells, this is not sufficient to create a true circular gradient in 
a grid world. 

The Procedures 

As this paper is a proof of concept of a challenging task, 
and the procedures below are nontrivial, we describe them in 
detail for replicability, and beg the reader’s forgiveness. 

Though we describe the ant’s procedures below in algo¬ 
rithmic pseudocode, we in fact implement each of them as 
a finite-state automaton (a DFA) which iterates through its 
process one step at a time as it is pulsed. Each state in the 
automaton is associated with some behavior to iteratively 
perform, which may be some simple action (like laying a 
pheromone) or a call to a lower-level procedure (such as wall 
following). Such a recursive DFA is known as a Hierarchi¬ 
cal Finite-State Automaton (HFA). Any procedure can signal 
done , which informs its calling procedure that it has finished. 

The procedures make heavy use of building gradients 
from various points in order to establish loci. They take 
two kinds of arguments: pheromones , which are capitalized 
(like Point A), and simple numerical values , which are lower¬ 
case (like direction or m). A pheromone typically establishes 
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a gradient leading to a maximal spot in the environment. For 
brevity, we often use the pheromone name (like PointA) to 
also refer to the location of its maximum (typically a site). 
There is always a single starting location for the ants: Home. 

Some procedures assume a world border, however in an 
environment with no border it could just as well be replaced 
with following along the locus of points for which the Home 
gradient is equal to some very small value. Only cells within 
the world border are considered valid places to move and 
read pheromones from. 

We first describe various basic procedures, then the five 
geometric procedures. Here is a summary of the five: 

• Draw a Circle Draw a line segment from the center 
to the edge point and use it to determine the radius (in 
terms of pheromone gradient from the center). Build a 
gradient from the center. Identify points whose gradient 
value matches the radius. Trace a line through those points. 

• Mark Line/Line Intersection This is simple: fan out 
until the intersection is discovered, then mark it. 

• Mark Circle/Circle Intersections Identify the intersec¬ 
tions and determine which is which. To do this, each ant 
randomly chooses to go to one or the other of the two 
circle centers. It then heads towards the other center until 
it finds the circle edge of its initial center, then follows 
clockwise along the circle until it reaches an intersection. 
The intersection is uniquely labeled according to which 
circle was followed when it was discovered. The process 
is illustrated below: 



• Mark Line/Circle Intersections Again, the trick is to 
distinguish the intersections from each other. To do this, 
the ants go to a certain extreme point on the line outside 
the circle, then follow down the line, and as they find 
intersections, mark them in the order they were found. 

• Draw an Extended (“Infinite”) Line This is tricky. 
Noise and the restriction to local information make the 
obvious approaches impossible for drawing an extended 
line, such going down the gradient away the circle centers, 
or walking straight using some notion of momentum. In¬ 
stead we set up a perpendicular bisector. The ants draw a 
circle centered at the first point and passing through the 
second point, then a circle centered at the second point, 
passing through the first. They identify the two circle/circle 
intersections, then draw the extended line that is the the 
perpendicular bisector between the two points. The various 
elements are shown below: 



Basic Procedures 

Backup() Back up one step, to the ant’s previous location. 
Further backups are not possible (the ant has no history). 

GoUp(P) Iteratively, in the current nine-cell neighborhood 
around the ant, move to the cell where the value of pheromone 
P is highest. Break ties randomly. If the highest value is zero, 
MoveRandomly( ). When at a local maximum, signal done. 

GoHome() The same as GoUp(Home). 

GoDown(P, m (default = 0)) Iteratively, in the current nine¬ 
cell neighborhood around the ant, move to the cell where the 
value of pheromone P is lowest. Break ties randomly. If the 
lowest value is at or below m or the ant cannot move because 
the surrounding area has a higher gradient (as happens at the 
world border), signal done. 

MoveRandomly() Move to a random location in the nine¬ 
cell neighborhood around the ant. Because our environment 
is bounded, we include an additional protective measure: if 
the ant reaches a world border, GoHome(). 

MarkSite(P) Set the value of pheromone P at the ant’s 
location to maximal, and mark it non-evaporating. 

LoadRegister(P) Store in the ant’s register the value of 
pheromone P at the ant’s location. 

Follow World Border (J/rcc/m//) Temporarily turn off ran¬ 
domness so as not to lose the border. Head along the border 
of the world in direction (clockwise or counterclockwise). 

Folio wLine(/^ End) Temporarily turn off randomness to 
not to lose the line. Among the eight-cell neighborhood 
around the ant, iteratively move to the cell where P is highest. 
Break ties by preferring forward-facing directions. When the 
ant has reached a cell with the End pheromone, signal done. 

WallFollow(P) Head clockwise such that pheromone P is 
always maximal in the cell immediately to the ant’s right 
(that is, follow along a “wall” of cells marked as sites for P). 

MakeLine(Pom/iA, PointB) This procedure sets up the gra¬ 
dients for straight line from PointA to PointB , but does not 
draw it. Iterate: GoUp(PointA), then GoUp(PointB). This 
causes the ants to go back and forth between the points, opti¬ 
mizing the trail until it is straight. After some time (perhaps 
5000 steps), signal done. 

BuildGradientDown(/5 m (default = 0)) Build a gradi¬ 
ent away from P, stopping when the gradient is well estab¬ 
lished down to value m. This is done by repeatedly iterating: 
GoUp(P), then GoDown(P, ax m). The a makes the ant 
go one cell further than needed. The ant initially doesn’t go 
straight down, but makes many random moves: and so we 
only signal done when GoDown(...) signals done and all the 
neighboring cells around the ant have nonzero values for P, 
indicating that it has likely built out the gradient well. 
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DrawLinefFoffiM, PointB, Trace ) This procedure draws 
a straight line of pheromone Trace from PointA to PointB. 
First, MakeLine(PointA, PointB). Then GoUp(PointA). Tem¬ 
porarily turn off random moves to make a straight line, then 
GoUp(PointB) while calling MarkSite(Trace) on each new 
grid cell. Upon reaching PointB , signal done. 

MakePerpendicularBisectorLine(P(9/wM, PointB, MarkA, 
MarkB, Temp ) The bisector line is the locus of cells where 
the gradients from points PointA and PointB are equal. Its 
ends are defined by MarkA and MarkB. This splits the swarm 
into two groups to build the two gradients in parallel. 

Randomly do either: (1) GoUp(PointA), then Make- 
BisectorHalf(PointA, PointB, MarkA, MarkB, Temp)’, or 
(2) GoUp(PointB), then MakeBisectorHalf(PointB, PointA, 
MarkB, MarkA, Temp). Then signal done. 

MakeBisectorHalf(MyPoint, OtherPoint, MyMark, Other- 
Mark, Temp ) This handles one sub-swarm. First, BuildGra- 
dientDown( My Point). While doing so, when the pheromone 
value of MyPoint is less than or equal to the pheromone value 
for OtherPoint, MarkSite(Temp); and whenever Temp is set at 
the ant’s current location but the pheromone value of MyPoint 
is greater than the pheromone value of OtherPoint, remove it, 
as it has been set incorrectly due to pheromone evaporation. 

Occasionally (with 0.1 probability) stop gradient-building 
and do the following. GoDown(OtherPoint) until the agent 
hits the world border, then FollowWorldBorder(clockwise) un¬ 
til one of three things happens: (1) If the value of OtherPoint 
is greater than MyPoint, the ant is too far: GoUp(MyPoint), 
then continue BuildGradientDown(MyPoint) as before. (2) If 
the ant finds a cell with Temp set, this is the far end of the line. 
MarkSite(MyMark), FollowWorldBorder(counterclockwise) 
a short distance (perhaps 100 steps), GoUp(MyPoint), 
GoUp( OtherPoint), GoUp(Home), and GoUp( My Point), then 
continue to BuildGradientDown(MyPoint) as before. This 
spreads the MyMark pheromone. (3) If the ant finds 
a non-zero MyMark gradient, the task is already com¬ 
pleted: GoUp(MyPoint) and continue to BuildGradient- 
Down(MyPoint) as before. Whenever both MyMark and 
OtherMark have been set at the ant’s cell, GoUp(MyPoint), 
then BuildGradientDown(MyPoint): this erases Temp in all 
cells. After some time (perhaps 1000 steps), signal done. 

The Five Geometric Construction Procedures 

DrawCircl ^(Center, EdgePoint, Temp, Circle ) This pro¬ 
cedure draws a circle with pheromone Circle, centered at 
Center, and which passes through EdgePoint. A temporary 
and much thicker circle is marked first, then the outer border 
traced, since fluctuations in pheromones and random move¬ 
ments of the ants can cause variation in what the ants perceive 
as being the correct distance from the center. 

MakeLine(Center, EdgePoint), then GoUp(Center), then 
GoUp(EdgePoint). At this point, LoadRegister(Center) to 
measure the gradient from Center, which will define the 


radius of the circle. GoUp(Center). Next, repeatedly it¬ 
erate through BuildGradientDown(Center, register), then 
GoDown(Center), then MarkSite(Temp). This causes the 
ant to mark the outer edge of the circle with Temp. After 
some time (perhaps 3000 steps), enough marks have been set 
to form a solid circular wall. At this point, the ant must find 
its way to the outside of this wall. To do this, GoUp(Center), 
then GoDown(Center, \3 x register), which causes the agent 
to move out well beyond circle whose radius is defined by 
register, then GoUp(Center) until it finds a cell with a Temp 
pheromone value (the wall). We set j 3 =0.01 based on ex¬ 
periment. Finally, trace the circle: WallFollow(Temp) while 
simultaneously doing MarkSite(Circle) on each new grid cell. 
After some time (perhaps 500 steps), signal done. 

MarkCirdeCirdelntersectionsfCcnterA, CenterB, Temp A, 
TempB, Circle A, Circle B, Markl, Mark2 ) This procedure 
identifies and uniquely distinguishes the intersections of two 
circles, one centered at CenterA and traced with CircleA, and 
one centered at CenterB and traced with CircleB. The Temp 
pheromones were those used to generate the original circles: 
they are called upon again to assist in wall-following. 

First randomly GoUp(CenterA) or GoUp(CenterB). If the 
ant chose CenterA, then GoUp(CenterB) until it finds CircleA, 
then WallFollow(TempA) until the ant reaches a cell with both 
CircleA and CircleB: then MarkSite(Markl). If the ant chose 
CenterB, then GoUp(CenterA) until it finds CircleB, then 
WallFollow(TempB) until the ant reaches a cell with both 
CircleA and CircleB: then MarkSite(Mark2). In either case, 
GoHome() and stay there for a while (perhaps 500 steps) to 
ensure other ants have seen the markings, then signal done. 

MarkCirdeLinelntersedionsfFuffiM, PointB, Circle, Line, 
Markl, Mark2 ) This procedure identifies and uniquely dis¬ 
tinguishes the (up to) two intersections of a circle traced out 
with Circle, and a line delimited by PointA and PointB, and 
traced out with Line. To make things simple, we assume that 
PointA and PointB are at the extrema of the line: this is not 
a problem, as the procedure for marking extended lines will 
mark the points at the borders of the environment. 

First GoUp(PointA), then EollowLine(Line, PointB), and as 
the ant is doing so, MarkSite(...) the first intersection (which 
has both Line and Circle pheromones) with Markl, and the 
second such intersection, if any, with Mark2. 

MarkLineLineIntersection(LmM, LineB, Mark) This 
identifies the intersection between two lines. One simple 
(and inefficient) approach is to search the space until we have 
discovered the unique intersection, then mark it. 

GoUp(Mark) (which moves randomly unless Mark is non¬ 
zero) until the agent has found a cell marked as both LineA 
and as LineB: at this point, MarkSite(Mark) then GoHome(), 
then signal done. If another ant finds the intersection first, 
we may see the Mark pheromone already, in which case 
GoHome() to tell the other ants, then signal done. 
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Figure 3: Building an Equilateral Triangle from Two Provided Points. (A) Emerging from Home. (B) Building the first line 
between the provided points. (C) Building the circle. (D) Building the second circle. (E) Determining the intersections, then 
building the second line. (F) Tracing the third line. (G) Finished. 


DrawExtendedLine(Pufi/M, PointB, TempA , TempB, 
CircleA, CircleB, Intersection A, Intersections, MarkA, 
MarkB, Line) This procedure requires nine distinct 
pheromones because of its approach to drawing an extended 
(arbitrarily long) line between two points Point A and PointB. 
It first draws circles with PointA and PointB each as centers 
and passing through the other point, respectively. It then 
identifies the intersections of these circles. The extended line 
is the perpendicular bisector of these two intersection points. 

The procedure is as follows. First DrawLine(PointA, 
PointB, Line). Then Draw Circle (PointA, PointB, TempA, 
CircleA). Then Draw Circle (PointB, PointA, TempB, Cir¬ 
cleB). Then MarkCircleCircleIntersections(PointA, PointB, 
TempA, TempB, CircleA, CircleB, IntersectionA, Intersec¬ 
tions). Next, MakePerpendicularBisectorLine(IntersectionA, 
Intersections, MarkA, MarkB) to identify the extrema of the 
extended line. Finally, DrawLine(MarkA, PointA, Line), then 
DrawLine(MarkB, PointB, Line). 

Demonstration and Comparison 

At this stage, if a shape can be provably built with compass- 
straightedge geometry, the ants can theoretically build it, 
given a finite-state automaton coded with the steps necessary. 
As a simple example, Figure 3 shows the process of building 
an equilateral triangle from two prespecified points. Using 
this approach, we have built a variety of structures: see for ex¬ 
ample the hexagon and angle bisector in Figure 4. Noise can 
cause some failures: in our simulator triangles presently have 
a 95% success rate, angle bisection is 91%, and hexagons are 
87%: we believe these rates can still be improved. 

While we have demonstrated that a basic pheromone model 
is capable of a nontrivial task such as this, we admit that it is 
not efficient : the agents build a variety of gradients through¬ 
out the environment, and if the system must use evaporation, 
then they must also maintain gradients until they have fin¬ 
ished a subtask. Because of the locality of the pheromone 
model, and the need to move randomly, the approach is also 
quite noisy. As can be seen in Figures 3 and 4, the resulting 
lines, shapes, and angles are not ideal. 

Broadcasting With small amount of global embedded com¬ 
munication, we can dramatically outperform a pheromone 



Figure 4: Hexagon (left) and Angle Bisection (right). 

model. To show this, we compare the model against the same 
model augmented with broadcast beacons : objects which an 
ant may deploy at any time and associate with a pheromone. 
Ants can detect the presence, distance to, and relative angle to 
a broadcast beacon anywhere in the environment. As a result, 
they can easily follow along a circle (the locus of points a 
certain distance from a beacon), or head down a line passing 
through two beacons (the locus of points where both beacons 
are at the same relative angle or opposite angles). This elimi¬ 
nates the need to build and maintain gradients, and so ants 
with broadcast beacons can generally complete most tasks 
over dramatically faster. Furthermore, as they do not use lo¬ 
cal updating on a square grid, broadcast beacons’ “gradients”, 
so to speak, are circular without evaporation. 

Except for the addition of broadcast beacons deployment 
and sensing, the ants are the same: in fact the revised pro¬ 
cedures work largely the same way as the all-pheromone 
approach. One exception: because relative angle to beacons 
is reliable, the ants can do DrawExtendedLine(...) by simply 
going away from both beacons (that is, having both beacons 
directly behind the ant), though the original perpendicular 
bisector approach could still have been used. 

For some examples, consider Figure 5, and compare 
against the same tasks done in Figures 3 and 4: there is 
something to be said for globally accessible information. 1 

Here we describe the additional basic procedures, then the 
geometric construction procedures, for ants using beacons. 

1 With three beacons, you could just do triangulation! But we 
want to show the ants’ capability without sophisticated trigonometry. 
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Further Basic Procedures with Broadcast Beacons 


GotoOrPlaceBeacon(P) If a beacon for P exists, head to¬ 
wards the beacon, and signal done on arrival. If the beacon 
does not exist, GoUp(P) until a site is discovered marked 
with P, then place a beacon for P at that site, and signal done. 

GotoOrPlaceBeaconShortCircuit(P) If a beacon for P 
exists, simply signal done. If the beacon does not exist, 
GoUp(P) until a site is discovered marked with P, then place 
a beacon for P at that site, and signal done. 

DrawLin e(LineA, LineB, P) GotoOrPlaceBeaconShort- 
Circuit(LineB). GotoOrPlaceBeacon(LineA). This ensures 
that beacons are located at both LineA and LineB , and that 
the ant is at LineA. Then head along the line between LineA 
and LineB , towards LineB , drawing the line with pheromone 
P. When at LineB , signal done. 

Geometric Construction with Broadcast Beacons 

MarkLineLineIntersection(L/wciA, LinelB, Line2, B) 

GotoOrPlaceBeacon(LinelA). Then head down the line 
which passes through LinelA and LinelB. It doesn’t mat¬ 
ter what direction. If the ant has gone “too far” (we define 
this as being at the world border, but any large measure is 
fine), turn around and head the other direction. When the 
ant discovers the pheromone Line2, denoting the trace of the 
second line, this is the intersection of the two lines. Place 
beacon B at this position and signal done. 

MarkCircleLineIntersections(LmcA, LineB, Circle, A, B) 

GotoOrPlaceBeacon(LineA). Then head down the line which 
passes through LineA and LineB in the direction of (and past) 
LineB. When the ant has gone “too far”, turn around and head 
the other direction. When the ant discovers the pheromone 
Circle , denoting the first intersection of the line and circle, 
place beacon A at this position. Then head down the line past 
LineA. When the ant has again gone “too far”, turn around 
and head the other direction. When the ant discovers the 
pheromone Circle , denoting the second intersection, place 
beacon B at this position and signal done. 

MarkCircleCircleIntersections(CcfiterA, CenterB, Cir¬ 
cle A, CircleB, A, B) GotoOrPlaceBeacon( Center A), then 
GotoOrPlaceBeacon(CenterB). Next, randomly head either 
to Center A from CenterB or to CenterB from Center A. If the 
ant goes to Center A, then when the ant finds the circle traced 
with CircleB , follow clockwise along the CircleB pheromone 
until an intersection point is found (having both CircleA and 
CircleB ), then PlaceBeacon(A). Continue along CircleB until 
another intersection point is found, then PlaceBeacon(B). On 
the other hand, if the ant goes to CenterB , then when the ant 
finds the circle traced with CircleA , follow clockwise along 
the CircleA pheromone until an intersection point is found 
(having both CircleA and CircleB ), then PlaceBeacon(B). 
Continue along CircleB until another intersection point is 
found, then PlaceBeacon(A). Either way, finally signal done. 



Figure 5: Equilateral Triangle (left), Hexagon (center), and 
Angle Bisection (right) using Broadcast Beacons. Compare 
with Figures 3 and 4. 


Figure 6: Circle (left) and Perpendicular Fine Bisector (right) 
using a simulator for robots with deployable sensor motes. 

DrawExtendedLine(ZifieA, LineB, P) GotoOrPlaceBea- 
conShortCircuit(LineA). Then GotoOrPlaceBeacon(LineB). 
This ensures that beacons are located at both LineA and LineB, 
and that the ant is at LineB. Then head along the line between 
LineB and LineA , towards and ultimately past LineA , drawing 
the line with pheromone P. When the ant has gone “too far”, 
GotoOrPlaceBeacon(LineA). Once at LineA , head along the 
line between LineA and LineB , towards and ultimately past 
LineB , drawing the line with pheromone P. When the ant has 
again gone “too far”, signal done. 

DrawCircl ^(Center, Edge Point, P) GotoOrPlaceBeacon- 
ShortCircuit(Center). Then GotoOrPlaceBeacon(EdgePoint). 
This ensures that beacons are located at both Center and Edge- 
Point, and that the ant is at Edge Point. LoadRegister( Center). 
Then head along the path (either direction) where the Center 
pheromone is equal to the register (this essentially traces 
along the circle), while drawing the line with pheromone P. 
When the ant is back at Edge Point, signal done. 

Conclusion 

We have demonstrated through proof of concept that, using 
only non-diffusing pheromones as a communication model, 
a swarm of ants can work together to perform compass- 
straightedge construction. This is a nontrivial task, but it 
demonstrates that indirect communication can enable sophis¬ 
ticated collaborative work. 

This demonstration also illustrates a potential weakness in 
our pheromone model: it can be very inefficient, as whole ar- 
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eas must be painted with pheromones and perhaps constantly 
updated. For this reason we feel our demonstration falls near 
the limit of what these models are realistically capable of sup¬ 
porting. The broadcast beacons method, on the other hand, 
still allows for efficient constructions while using only sparse, 
robot-deployable devices. 

Both methods are applicable to real robots, and imme¬ 
diate future work is to demonstrate this on an actual robot 
swarm. We have gathered preliminary (and noisy) results for 
the pheromone model in a simulator in which robots use de¬ 
ployable wireless sensor motes to create a sparse pheromone 
graph (as in Hrolenok et al. (2010) and Russell et al. (2015)); 
this is shown in Figure 6. To use such capabilities in a real- 
world scenario, however, will require solutions to a number of 
additional issues, including: obstacles in the environment, dy¬ 
namic environments where marked areas might be removed, 
and environments without a known border. 
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Abstract 

Heuristic search is a core area of Artificial Intelligence, 
successfully applied to planning, constraint satisfaction and 
game playing. In real-time heuristic search autonomous 
agents interleave planning and plan execution and access en¬ 
vironment locally which make them more suitable for Artifi¬ 
cial Life style settings. Over the last two decades a large num¬ 
ber of real-time heuristic search algorithms have been manu¬ 
ally crafted and evaluated. In this paper we break down sev¬ 
eral published algorithms into building blocks and then let a 
simulated evolution re-combine the blocks in a performance- 
based way. Remarkably, even relatively short evolution runs 
result in algorithms with state-of-the-art performance. These 
promising preliminary results open exciting possibilities in 
the field of real-time heuristic search. 

1 Introduction and Related Work 

Artificial Life (ALife) settings afford a researcher an in¬ 
tuitive testbed to study autonomous agents. In particular, 
ALife has been used to study emergence of various cogni¬ 
tive mechanisms, including a source of rewards (Ackley and 
Littman, 1991) in a Reinforcement Learning setting (Sutton 
and Barto, 1998). In this paper we propose to use ALife 
to study heuristic search algorithms. Heuristic search is a 
core area of Artificial Intelligence with a long history (Hart 
et al., 1968) and a broad applicability to planning, game¬ 
playing and constraint-optimization tasks. Heuristic search 
algorithms take a search graph and a start and goal states and 
output a path through the graph that connects the start and 
the goal. A canonical example is pathfinding on a road map: 
one can ask their in-car GPS to find a route from Edmon¬ 
ton to Cancun. Shorter routes may be preferred and quick 
computation times are valued. 

We focus on real-time heuristic search — a subclass of 
agent-centered algorithms (Koenig, 2001) where the agent 
has to act before a full solution to the search problem is com¬ 
puted and has access only to the environment and data in the 
vicinity of the agent’s current state. In other words, plan 
execution must be interleaved with the planning process and 
there is no global view of the world. These constraints would 
be important in the context of a self-driving car: its steering 


algorithm needs to issue commands to the steering wheel so 
many times per second while the GPS is computing the full 
route, regardless of how distant the goal is. Another applica¬ 
tion of real-time heuristic search is distributed search such as 
routing in ad hoc sensor networks (Bulitko and Lee, 2006). 

Starting with LRTA* (Korf, 1990) real-time heuristic 
search agents interleave three processes: local planning, 
heuristic learning and move selection. In over the two 
decades since LRTA*, researchers have explored different 
methods for looking ahead during the planning stage of each 
cycle (Koenig and Sun, 2009); different heuristic learning 
rules (Bulitko; Hernandez and Meseguer; Bulitko and Lee; 
Rayner et al.; Koenig and Sun; Rivera et al., 2004; 2005; 
2006; 2007; 2009; 2015) and different move selection mech¬ 
anisms (Ishida; Shue and Zamani; Shue and Zamani; Shue 
et al.; Bulitko and Lee; Hernandez and Baier, 1992; 1993a; 
1993b; 2001; 2006; 2012). Finally, information in addi¬ 
tion to the heuristic has been learned during (Bulitko et al.; 
Sturtevant et al.; Sturtevant and Bulitko; Sharon et al., 2007; 
2010; 2011; 2013) and before (Bulitko et al.; Bulitko et al.; 
Botea; Bulitko et al.; Lawrence and Bulitko, 2008; 2010; 
2011; 2012; 2013) the search. 

The number of techniques proposed by the researchers in 
the field of real-time heuristic search is overwhelming. More 
importantly, the interactions between these techniques are 
difficult to analyze empirically (Bulitko and Lee, 2006) or 
theoretically (Sturtevant and Bulitko, 2014). In this paper, 
we frame the problem of finding a high-performance combi¬ 
nation of real-time heuristic search techniques as a survival 
task. To do so we set up a colony of autonomous real-time 
heuristic search agents whose genes determine their opera¬ 
tion in their life-time. Finding high-quality solutions sus¬ 
tains an agent and eventually allows it to mate and repro¬ 
duce. During the reproduction new agents are born, each 
with a slightly different genetic code. 

Unlike human researchers in the field of real-time heuris¬ 
tic search, such a simulated evolution has no prior expec¬ 
tations, intuitions or biases. It simply conducts a random¬ 
ized parallel search of a large space of real-time heuristic 
search algorithms. Yet, preliminary results in the standard 
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testbed of pathfinding on video-game maps are promising. 
An evolution run on a desktop computer in under a day led 
to a new real-time heuristic search algorithm that outper¬ 
forms the state of the art. The emergence of such new high- 
performance algorithms appears to be a robust phenomena 
as we have repeated the evolution process several times, with 
similar results. 

However, evolution has two downsides relative to the tra¬ 
ditional process of designing real-time heuristic search al¬ 
gorithms: (i) it does not prove any theoretical properties of 
the new algorithms and (ii) it does not intuitively explain 
their performance. Thus, we suggest using the evolution as 
a computer-assisted exploratory step, to be followed by the¬ 
oretical analysis and additional manual design. 

The rest of the paper is organized as follows. Section 2 
formally defines the problem we are attempting to solve. We 
then review the common framework of heuristic-learning 
real-time heuristic search algorithms in Section 3 and de¬ 
scribe the building blocks in Section 4. We conduct a sim¬ 
ulated evolution in a class of real-time heuristic search al¬ 
gorithms in Section 5 and present the empirical results in 
Section 6. We then conclude with directions for future work. 

2 Problem Formulation 

In line with previous research, we define a search problem 
S as the tuple (5, E , c, so? s g, h) where S is a finite set of 
states and E c S x S is a set of edges between them. S and 
E jointly define the search graph which is assumed to be 
undirected: \/s a ,s b G S[(s a ,s b ) <E E => (s b ,s a ) G E] 
and has no self-loops: Vs G S [(s, s) 0 E]. The graph is 
weighted by the strictly positive edge costs c : E —» M + 
which are symmetric: Vs a ,s b G S[c(s a ,s b ) = c(s b ,s a )]- 
Two states s a and s b are immediate neighbors iff there is 
an edge between them: (s a ,s b ) G E\ we denote the set 
of immediate neighbors of a state sby iV(s). A path P is 
a sequence of states (so, si,..., s n ) such that for all i G 
{0,..., n — 1}, (si, Si+i) G E. We assume that the search 
graph (5, E) is connected (i.e., any two vertices have a path 
between them) which makes it safely explorable. 

At all times t G {0,1,... } the agent occupies a sin¬ 
gle state s t G S, called the current state. The state so is 
the start state and is given as a part of the problem. The 
agent can change its current state, that is, move to any im¬ 
mediately neighboring state in N(s). The traversal incurs 
a travel cost of c(s t , St+i). The agent is said to solve the 
search problem at the earliest time T it arrives at the goal 
state: st = s g . The solution is a path P = (sq> • • • ? 
a sequence of states visited by the agent from the start state 
until the goal state. The cumulative cost of all edges in a 
solution is called solution cost and is formally defined as 
ca( S) = Ylt=o c ( s t, s t+ 1 ) for algorithm A. The cost of the 
shortest possible path between states s a , s b G 5 is denoted 
by h*(s a , s b ). We abbreviate ft*(s, s g ) as h*(s). We define 
suboptimality of the agent on a problem as the ratio of the 


solution cost the agent incurred to the cost of the shortest 
possible solution: a(A, S) = • For instance, subop¬ 

timality a (LRTA*, S) = 2 means that the agent driven by 
the LRTA* algorithm found a solution to S twice as long as 
optimal. Lower values are preferred; 1 indicates optimality. 

The other performance measure we are concerned with 
in this paper is the scrubbing complexity (Huntley and Bu- 
litko, 2013). It is defined as the average number of state 
visits the agent makes while solving a problem. Formally, 
let : S N U {0} be the number of state visits the 
agent driven by algorithm A made while solving a problem 
S. The scrubbing complexity is then defined over the subset 
of states that the agent visited at least once: Suited = W G 
s I <4(s') > 1} as t(A, S) = 1 sG^E se s visiMd v i(s). 
For instance, r(LRTA*, S) = 7.5 means that while solv¬ 
ing problem S, on average the agent driven by the LRTA* 
algorithm visited a state 7.5 times (states that were not vis¬ 
ited at all do not contribute to the average). Lower values 
of t(S) are preferred since re-visiting states tends to look 
irrational to an external observer. This is a major reason 
why real-time heuristic-search methods are hardly used for 
pathfinding in actual video games. Instead, game develop¬ 
ers prefer non-real-time heuristic search such as variants of 
path-refinement A* (Sturtevant, 2007). 

In its operation the agent has access to a heuristic h : S 
[0, oo). The heuristic function is a part of the search problem 
specification and is meant to give the agent an estimate of 
the remaining cost to go. Unlike much literature in the field, 
we do not assume admissibility or consistency of the initial 
heuristic but require that h(s 9 ) = 0. The search agent can 
modify the heuristic as it sees fit as long as it remains non¬ 
negative and the heuristic of the goal state s g remains 0. The 
heuristic at time t is denoted by hp, ho = h. 

We say that a search agent is real time iff its computation 
time between its moves is upper-bounded by a constant in¬ 
dependent of the number of states in the search space. We 
will additionally require our search algorithms to be agent 
centered (Koenig, 2001) insomuch as they have access to 
the heuristic, states and edges only in a bounded vicinity of 
the agent’s current state and the bound is independent of the 
number of states. 

We say that a search agent is complete iff it solves any 
search problem as defined above. That is, it is required to 
terminate in the goal state s g at time T < oo. Since we deal 
with randomly generated algorithms which may or may not 
be complete, we impose an upper bound on their travel cost. 
Any algorithm whose suboptimality on a problem exceeds 
cumax is said to not solve a problem. We implement this by 
monitoring the agent’s travel cost and terminating an agent 
as soon as it reaches or exceeds ct max ^*(^o)‘ The resulting 
suboptimality is then recorded and contributes to the average 
suboptimality of the agent over a set of problems. 

The problem we tackle in this paper is to develop a real- 
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time heuristic search algorithm that has low suboptimality 
(a) and low scrubbing complexity (r). 

3 Basic Real-time Heuristic Search 

As Section 1 presented, many ways of improving on 
LRTA* (Korf, 1990) towards the two measures have been 
proposed. In this paper we specifically focus on heuristic 
learning and movement rules. To isolate the problem, we fix 
the lookahead at 1 (i.e., allow the agent to consider only the 
immediate neighbors of its current state during the planning 
stage) and allow the agent to update its heuristic only in its 
current state. In other words, our local search space and the 
local learning spaces are limited to the agent’s current state. 
With these limitations, LRTA* becomes Algorithm 1. 


Algorithm 1: Basic Real-time Heuristic Search 

input : search problem (S', E, c, so,s g , h ) 
output: path (s 0 , si, • • • , st), st = s g 

1 t <— 0 

2 h t h 

3 while s t / s g do 

4 s t +i arg min (c(s t , s) + h t (s)) 

seN(st) 

5 ht+i(st) max < ht(st\ min (c(s t , s) + h t (s)) 

{ seN(st) 

6 t <— t + 1 

7 T±- t 


A search agent following the algorithm begins in the start 
state So • H then executes a fixed loop until it reaches the goal 
s g (line 3). At each iteration of the loop, the agent expands 
the current state s t by generating its immediate neighbors 
N(st) (local planning). It computes its action by selecting 
the next state s t + 1 among the neighbors to minimize the es¬ 
timated cost of traveling to the goal through that neighbor 
(line 4). Ties among neighbors that have the same c + h val¬ 
ues are broken with a tie-breaking schema that is consistent 
over state revisits. Then, in line 5, the agent updates (learns) 
its heuristic in the current state from h t (s t ) to h t +i(s t ). 
Note that the explicit maximum of the state’s old heuristic 
value and the new value causes the heuristic to never de¬ 
crease. Such a maximum is unnecessary if the heuristic is 
consistent and is commonly omitted in the literature. We do 
not assume our heuristic to be consistent and hence put the 
maximum in explicitly. The agent then changes its current 
state to the neighbor and the cycle repeats. 

4 Building Blocks 

To define the space of real-time heuristic search algorithms, 
we first abstract the base algorithm (Algorithm 1) into a 
search algorithm template (Algorithm 2). The template still 
has the main loop (line 3) which the agent executes until 
it gets to the goal state. Within the loop the agent repeat¬ 
edly executes the movement (line 4) and the learning (line 5) 
rules. The rules can include the following building blocks. 


Algorithm 2: Search Algorithm Template 

input : search problem (S', E , c, so, s g , h ) 
output: path (so, si, ..., st) such that st = s g 

1 t <— 0 

2 ht i — h 

3 while s t / s g do 

4 st +1 <— new s due to a movement rule 

5 ht+i(st) <— max {ht(s t ), new h due to a learning rule} 

6 t i — t ~\~ 1 

7 T 4 — t 


4.1 Movement Rule Building Blocks 

For the movement rule we use line 4 in the base Algorithm 1 
with the following possible blocks on top: 

Backtracking (Shue and Zamani; Shue and Zamani; 
Shue et al.; Bulitko and Lee, 1993a; 1993b; 2001; 2006) 
causes the agent to move back to the previous state on its 
path when the heuristic is updated. For simplicity we do 
not consider the learning quota T parameter of SLA*T and 
LRTS and move the agent back to the previous state as 
soon as the heuristic function is updated (i.e., learning takes 
place). The intuition of backtracking is that upon detect¬ 
ing an inaccurate heuristic value the agent should not only 
update it in the current state but also in the previous states 
whose heuristic values may be dependent on it. 

Depression avoidance (Hernandez and Baier, 2012) de¬ 
tects whether the current state is a part of a heuristic de¬ 
pression — an area of the state space where the heuristic 
values are inaccurately low — and tries to guide the agent 
out of such a depression. The problem of heuristic depres¬ 
sions was identified early on (Ishida, 1992) and linked to 
state-revisitation and increased solution cost (Huntley and 
Bulitko, 2013). In our building block we use the method 
implemented in daLRTA* by Hernandez and Baier (2012). 
Specifically, to select a neighbor of the current state as its 
next state, the agent considers only the states N m \ n learning (st) 
where the amount of learning to-date is minimal: 

AUn learning (s t ) = {s G N(s t ) \ \h 0 (s) ~ h t (s) \ = fl} 

li = min \ho(s) — h t (s)\. (1) 

s£N(s t ) 

Then the agent selects its next state s t from the set 
A^min learning (st) in the usual fashion (i.e., the one that min¬ 
imizes c(st, s ) + h t (s) in line 4, Algorithm 1). 

Removing expendable states (Sharon et al., 2013) is 
meant to reduce the size of the search space by eliminat¬ 
ing a state whose immediate neighbors can be all reached 
from each other within the immediate neighborhood. Such 
states are called locally expendable. In line with Sharon 
et al. (2013) we remove a locally expendable state from the 
search graph only when the heuristic of the state is updated 
by the agent. 

4.2 Learning Rule Building Blocks 

For the learning rule, we extend the basic mini-min update 
of LRTA* (line 5 in the base Algorithm 1) with the following 
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possible blocks: 

Heuristic weighting is meant to accelerate the learning 
process and thus discourage the agent from re-visiting states 
(a major problem with heuristic depressions). We modify 
the weighting rule of Rivera et al. (2015): 

h t +i(s t ) 4- max h t (s t ), min (w • c(s t , s) + h t (s)) > (2) 

[ seN(st) ) 

by moving the weight outside of the c + h: 

ht+i(st)<-max<h t (st),w- min (c(s t , s) + h t (s)) > (3) 

[ seN(st) ) 

which further increases the updates to the heuristic of the 
current state. We do so as our preliminary tests have shown 
the modified rule to be more robust and thus easier to tune 
the weight for. Note that w > 1 means that the heuristic may 
become inadmissible and inconsistent. 

Learning operators other than the min can be used in 
the learning rule (3) if admissibility/consistency is not a re¬ 
quirement. We allow our agents to replace the min with avg, 
median or max. 


Algorithm 3: Real-time Heuristic Search w/ Building Blocks 

input : search problem (S, E , c, so,s g ,h), control 
parameters w , b , lop, da, expendable, 
backtrack 

output: path (so, si,..., st), st = s g 

1 teO 

2 ht 4 — h 

3 while s t ^ s g do 

4 if da then 

5 N (st) 4 -/V m in learning (<St) 

6 h t +i(s t ) <~ 

max < h t (st), w • lop (c(s t , s) + h t (s)) \ 

[ seN^(s t ) J 

7 if expendable & ht+i(st) > h t (s t ) & S(st) then 

8 [_ remove s t from the search graph 

9 if backtrack & /it+i(st) > ht(st) then 
io |_ s*+i 4- S t -1 

n else 

12 I s t +i 4- arg min (c(s t , s) + h t (s)) 

\_ seN(st) 

13 t 4 — t - 1~1 

14 T 4 — t 


Lateral learning allows the learning operator (min, avg, 
median, max) in the learning update rule (3) to be taken 
over a portion of the neighborhood N(s t ). Exploratory ex¬ 
periments have shown that doing so can improve the agent’s 
performance. We define the portion using a parameter b 
called beam width. Specifically, the partial neighborhood 
Nl of a state s t is defined as the b fraction of the neighbor¬ 
hood N(st) with the lowest / values: 

,y/(V) (.S 1 . s W N MU^j (4) 

where (s 1 ,..., s L&|w( s t)|J ^ ^ S |w(st)|) j s t h e immediate 

neighborhood sorted in the ascending order by their f = c- h 
h. For instance, s 1 has the lowest /(s 1 ) = c(s*, s 1 ) + h(s 1 ) 
value in the set {/(s) | s G N(s t )} whereas s\ N ( St ^ has 
the highest / value in that set. Clearly, for b = 1 we get the 
full neighborhood: N{(s t ) = N(s t ). For b = 0 we define 
Nq (s t ) as the neighbor with the lowest /: {s 1 }. 

4.3 Putting the Building Blocks Together 

With these building blocks, the template Algorithm 2 be¬ 
comes Algorithm 3. The main loop is the same as before 
(line 3). Inclusion of the building blocks is determined 
by the control parameters w, b , lop, da, expendable, 
backtrack as follows. If the depression avoidance block 
is present in the agent (da = true ) then line 5 temporarily 
sets the neighborhood to only the states where the amount 
of learning \h t (s) — /iq(s)| is minimal. The learning rule 
in line 6 covers heuristic weighting, learning operator and 
lateral learning, using the control parameters w, lop and b. 

If the expendable block is present in the agent 
(expendable = true ) then in line 8 the current state is 


removed from the graph if there was learning in it and it 
is indeed locally expendable (denoted by the predicate £). 
Finally, if the agent learned in the current state and the back¬ 
tracking block is present (backtrack = true) then the 
agent will move back to the previous state in line 10. Oth¬ 
erwise it moves forward in line 12. If there is no previous 
state (i.e., s t = so) then the agent stays put. 

If at any time the neighborhood N(s t ) becomes empty 
(i.e., the agent has no moves to pick from) then the agent 
quits without producing a solution. Such unsolved problems 
contribute a max h*(s 0 ) +mm SatSbeS c(s a , s b ) to the agent’s 
statistics on the solution cost/suboptimality. 

In the rest of the paper we compactly denote any such 
algorithm by listing its building blocks as w • lop 6 (c + 
fo)+backtrack+da+E where the last three parts are optional. 

5 Simulated Evolution 

We had briefly experimented with an ERL-style asyn¬ 
chronous evolution (Ackley and Littman, 1991) without an 
explicit fitness function or generations. However, given our 
current heuristic search code base, it was computationally 
prohibitive and we switched to the traditional style evolu¬ 
tion with discrete generations and an explicit fitness func¬ 
tion. Section 7 briefly discusses the alternative. 

The simulated evolution proceeds as per Algorithm 4. It 
starts in line 2 with forming the initial population Pq of K 
agents. Each agent is represented by its gene which encodes 
the building blocks used by the agent. Technically the gene 
is a vector (re, 6, lop, da, expendable, backtrack). 
Each of six components of the initial agents’ genes is picked 
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Algorithm 4: Evolution of Search Agents 

input : search problems, batch size B , max suboptimality 
Qmax, population size K , number of generations M 
output: genome of Voidest 

1 t<- 0 

2 create population Po of size K with random genes 

3 for t — 1,..., M do 

4 for p G Pt-i do 

5 |_ <t>(p) <— a(p) over B problems, truncated at a max 

6 sort Pt-i by f 

7 C <— children(P t _i(l,..., A/2)) 

s Pt <— , K/2) U C 

9 [_ update Widest 


uniformly randomly from their respective ranges. Some of 
the ranges are domain specific and we will list them in Sec¬ 
tion 6.2. Binary gene components (e.g., da which takes on 
the values of true and false ) are represented as a contin¬ 
uous value in [0,1]. The algorithm converts it to binary by 
rounding. For instance, an agent with the da gene of 0.4 will 
not have depression avoidance block as round(0.4) = 0. 
However, an agent with the da gene of 0.7 will perform de¬ 
pression avoidance as round(0.7) = 1. The same schema is 
used for the learning operator lop gene which takes on a con¬ 
tinuous value in [1,4] but is rounded to the nearest discrete 
value inside the search agent to select the operator (1, 2,3,4 
encode min, avg, median and max correspondingly). 

During each of the M generations each agent of the pop¬ 
ulation is evaluated on B problems picked randomly from 
search problems given to the evolution. The fitness of the 
individual p is denoted by f(p) and is the average subopti¬ 
mality of the agent p on the B problems (line 5). Each run 
is truncated if the agent exceeds the suboptimality <a max as 
explained earlier in the paper. 

The population is then sorted by the agents’ fitness in 
line 6. The top half of the population, P t _i(l,...,AT/2), 
are included in the next generation’s population P t and are 
allowed to reproduce. The bottom half are removed and are 
replaced by K/ 2 children in line 8. 

The set of children C is created by randomly picking two 
parents from the top half of Pt-i and performing a cross 
over on their genes in line 7. In other words, each gene 
value of the child has a 50/50% chance of coming from 
the father or the mother. The child’s gene values are then 
mutated by adding a Gaussian noise of zero mean and the 
standard deviation of 1/100th of the size of the gene range 
(e.g., (4 — 1)/100 = 0.03 for the lop gene). If adding 
Gaussian noise pushes the gene value outside of the valid 
range then the value is clipped. For instance, if a child in¬ 
herited lop = 4 from its parent but the mutation noise made 
it 4 + 0.03 = 4.03 then it will be brought back down to 4. 

Finally, line 9 updates the running oldest agent p 0 idest (he., 
the one who has survived the most generations so far) with 
an older one, if such is found in the current generation. The 


genome of the earliest oldest agent is the output of the evo¬ 
lution. Ties between agents of the same age are broken in 
the favor of lower suboptimality. 

6 Empirical Evaluation 

The traditional testbed for real-time heuristic search is 
pathfinding in video games, where the planning time per 
move is limited to a few milliseconds for all simultaneously 
operating agents. The standard way (Sturtevant, 2012) of 
representing game maps is as a two-dimensional discrete 
grid where each grid cell is either available for the agent 
to pass through (i.e., vacant, shown white in Figure 1) or 
blocked by an obstacle (black). At each moment of time, 
the agent occupies a single vacant cell of the map which 
determines the agent’s current state. The agent changes its 
state by moving from its current grid cell to one of its vacant 
neighbors, incurring a travel cost. In this paper we use the 
standard eight-connected maps where cardinal moves cost 1 
and diagonal moves cost s/2. A problem is solved when the 
agent enters the goal cell. At the beginning of each problem, 
the agent starts with the octile distance as the heuristic. Oc- 
tile distance is the cost of the shortest path between a given 
cell and the goal cell if no cells are blocked by obstacles. 



Figure 1: A video-game map from Dragon Age: Origins. 

We implemented all algorithms in a mixture of C and 
MATFAB code run on Intel i5/i7-based desktop computers. 
Parts of the code were run in parallel on 4-6 CPU cores. 

6.1 Problem Set 

We used the benchmark problems from the Moving AI 
set (Sturtevant, 2012). We treated the water terrain type as 
obstacle and excluded all problems which thereby became 
unsolvable (e.g., the start state is in an obstacle cell). This re¬ 
sulted in 493298 problems situated on 342 maps. The maps 
were from the video games Star Craft, WarCraft III , Baldur’s 
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Gate II (maps scaled up to 512 x 512) and Dragon Age: Ori¬ 
gins (Figure 1). 

6.2 Evolution 

We conducted three evolution runs. The first run was of 50 
generations each of 200 agents. Each agent of each genera¬ 
tion was evaluated on 200 random problems. The maximum 
suboptimality a max was set to 1000. The maximum age of 
any agent was 14 generations and it was first achieved in 
generation 47 by the agent 8.223 • mino. 34 i(c + h)+ E. The 
run took approximately 6 hours, with the agents solving 92 
problems/second. 

The second run was of 25 generations each of 200 agents. 
Each agent of each generation was evaluated on 400 ran¬ 
dom problems. The maximum suboptimality ct max was set 
to 1000. The maximum age of any agent was 11 genera¬ 
tions and it was first achieved in generation 12 by the agent 
7.952 • min 0 . 72 o(c + h)+ da+E. The run took approximately 
5 hours, with the agents solving 110 problems/second. 

The third run was of 200 generations each of 100 agents. 
Each agent of each generation was evaluated on 100 ran¬ 
dom problems. The maximum suboptimality ct max was set 
to 1000. The maximum age of any agent was 16 genera¬ 
tions and it was first achieved in generation 72 by the agent 
8.061 • avg 0>0 29 ( c + h)+ E. The run took approximately 9 
hours, with the agents solving 63 problems/second. 

We evaluated the resulting algorithms on non-overlapping 
sets of 10000 random problems. The suboptimality cutoff 
was set to <a max = 10 5 which allowed all of them to solve 
all problems. The results are found in Table 1 and suggest 
that the evolved individuals are similar. For the subsequent 
evaluation we picked 8.223 • mino. 34 i(c + h)+ E algorithm. 


Table 1: Results of the three evolution runs. Means and 
standard errors of the mean are listed. 


Algorithm 

Suboptimality a 

Scrubbing r 

8.223 • mino.341 
7.952 • mino.720 
8.061 • avg 0 029 ( 

(c + h)+ E 
(c + /i)+da+E 
c h)+E 

20.70 ± 0.4782 
21.17 ± 0.5220 
21.35 ± 0.5366 

1.17 ± 0.0036 

1.18 ± 0.0037 
1.18 ± 0.0038 


6.3 Systematic Search 

We also ran systematic search in the space of the possible 
algorithms. To do so, we tabulated the weight space w in 
15 increments from 1 to 10. We tabulated the beam width 
b in 15 increments from 0 to 1. For each of those combi¬ 
nations we tried all four learning rule operators (min, avg, 
median and max) and all three movement rule blocks: de¬ 
pression avoidance, backtracking and removing expendable 
states. In total we created 15 x 15 x4x2x2x2 = 7200 
agents. Each of them was run on the same 100 random 
problems. The three agents with the lowest suboptimal¬ 
ity were: maxo. 714(0 + h)+ E, maxo. 143(0 + h)+ da+E and 
2.286-avg 0 286 ( c +^)+da+E. Their suboptimalities over 100 
problems were 14.56, 14.87 and 15.14 respectively. 


As before, we evaluated the resulting algorithms on non¬ 
overlapping sets of 10000 random problems. The subopti¬ 
mality cutoff was again set to <a max = 10 5 which allowed 
all three algorithms to solve all problems. The results are 
found in Table 2 and suggest that the evaluation on the 100 
problems during the systematic search is not representative 
of the algorithms’ true performance. For instance, the sec¬ 
ond algorithm was selected for its second-best suboptimal¬ 
ity of 14.87 as measured on the 100 problems. However, 
as measured on 10000 problems, it performed substantially 
worse (suboptimality of 30.44). 


Table 2: Results of the three evolution runs. Means and 
standard errors of the mean are listed. 


Algorithm 

Suboptimality a 

Scrubbing r 

maxo. 714(0 + h)+ E 
maxo. 143(0 + h)+ da+E 

2.286 • avg 0 286 (c + h)+ da+E 

19.56 ± 0.5237 
30.44 ± 1.4305 
20.47 ± 0.5109 

1.16 ± 0.0035 
1.22 ± 0.0075 
1.13 ± 0.0026 


Such misleading estimates of the algorithm’s performance 
appear a necessary downside of the systematic search. In¬ 
deed, to evaluate all 7200 algorithms in a reasonable amount 
of time, each can be evaluated only on a small set of prob¬ 
lems. In contrast, our evolution looks for the oldest individu¬ 
als which had to remain in the top half of the performance in 
each generation to make it to the next. For instance, the best 
algorithm found by the evolution, 8.223 • mino .341 (c+ h)+E, 
lasted for 14 generations and thus had its performance eval¬ 
uated on 14 x 200 problems. 

For the subsequent evaluation we picked max 0 . 714(0 + 
h)+ E as the output of the systematic search. 

6.4 Competing Algorithms 

We selected two groups of published algorithms as 
competitors. In the first group, we have LRTA* (Korf, 
1990) and its weighted variant wLRTA* (Rivera et al., 
2015). To select the weight in wLRTA*, we ran w £ 
{1, 2,3,4, 5, 6 , 7,8,16,32,64,128, 256, 512,1024, 2048} 
on 6000 non-overlapping problems each and found that 
w = 128 gave the lowest suboptimality. 

In the second group we have existing algorithms 
that explicitly aim to escape heuristic depressions 
quicker than LRTA*. Specifically, we evaluated the 
myopic versions (lookahead of 1) of aLRTA*, daL- 
RTA* (Hernandez and Baier, 2012), its weighted version 
w-daLRTA* (Rivera et al., 2015) and its combination 
with removing expendable states, daLRTA*+E (Sharon 
et al., 2013). We did not include f-LRTA* or f-LRTA*+E 
in our evaluation since both versions were found infe¬ 
rior to daLRTA* by Sharon et al. (2013). To select 
the weight parameter for w-daLRTA*, we ran w £ 
{1, 2,3,4, 5, 6 , 7,8,16,32,64,128, 256, 512,1024, 2048} 
on 6000 non-overlapping problems each and found that 
w = 7 gave the lowest suboptimality. 
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6.5 Competition Results 

We ran the eight algorithms, each on 30000 non-overlapping 
problems randomly selected from the benchmark set. For 
each algorithm we measured its suboptimality a and scrub¬ 
bing r. The suboptimality cutoff was set to 10 5 • h*(so) and 
all algorithms solved all problems under the cutoff. For the 
systematic search we used algorithm maxo.714(0 + h)+ E. 
For the evolution we used 8.223 • mino.341(0 + h)+ E. The 
means and standard errors of the mean are listed in Table 3. 


Table 3: Performance of the algorithms: sample mean d= 
standard error of the mean. 


Algorithm 

Suboptimality a 

Scrubbing r 

LRTA* 

455.58 

±7.2984 

16.24 

±0.1423 

aLRTA* 

349.21 

±6.3649 

11.04 

±0.1175 

daLRTA* 

44.98 

±0.7425 

1.98 

±0.0104 

wLRTA* 

39.81 

±0.5458 

1.83 

±0.0078 

wdaLRTA* 

30.95 

±0.4521 

1.48 

±0.0050 

daLRTA*+E 

31.82 

±0.7782 

1.21 

±0.0041 

systematic search 
evolution 

19.75 ±0.2836 
20.79 ±0.2809 

1.17 ± 0.0020 
1.17 ± 0.0021 


The new algorithms found by evolution and systematic 
search appear to produce shorter solutions and scrub less 
than the classical and contemporary competing algorithms. 

Compared to the existing algorithms, the new algo¬ 
rithms use a combination of more aggressive learning rules, 
maxo.714(c + h) or 8.223 • mino.34i(c + h), with marking 
expendable states. Naturally, either of these combinations 
could have been found manually by the researchers but since 
it is time-consuming and tedious to try all combinations of 
building blocks, researchers tend to focus on a few, guided 
by their experience and intuition. Evolution and systematic 
search lack either and operate entirely based on algorithm 
performance (within the human-defined algorithm space). 

7 Future Work 

The promising preliminary results reported above open a 
few avenues for future work. First, while we tested the 
evolved algorithms on problems that they may not have seen 
during the evolution process, it is still possible that the re¬ 
sulting algorithms work so well merely because they overfit 
to the video-game pathfinding maps. Thus, the next step 
is to explore how effective the process is in other real-time 
heuristic search domains and across domains. 

Second, additional real-time heuristic search techniques 
such as deeper lookahead of LSS-LRTA* (Koenig and Sun, 
2009) can be made available to the evolution as building 
blocks. To make the competition fair, additional planning 
time per move should be incorporated into the fitness func¬ 
tion. Another possibility to curb computational complexity 
is to assume that computations are subject to errors. For 
instance, one can introduce access errors to the heuristic 


which is realistic when heuristic values are stored in the en¬ 
vironment (a la pheromone in ant-colony algorithms) and 
the agents may not always read/write them perfectly. Then 
robust algorithms are likely to evolve (Ackley, 2013). 

Finally, we can attempt to evolve robust search agents 
specifically for a class of problems. Following Ackley and 
Small (2014) we can have the agents continuously evolv¬ 
ing in an asynchronous environment, similarly to the ERL 
testbed of Ackley and Littman (1991). In such a setting 
there would be no discrete generations and no explicit fitness 
function. Instead, each agent will maintain a health value 
which is depleted during the search and replenished when 
the agent reaches its goal, before taking on the next search 
problem. An agent which lived long enough and healthy 
enough is given an opportunity to reproduce. The domain- 
specificity would allow us not only to evolve the algorithms 
but also their innate heuristic function which can be encoded 
in the genes using function approximation. 

8 Conclusions 

We presented possibly the first application of evolutionary 
search to real-time heuristic search algorithms. We did so by 
breaking down classical and contemporary real-time heuris¬ 
tic search algorithms into building blocks for the evolution 
to pick from. In a large-scale evaluation in video-game 
pathfinding, evolved real-time heuristic search agents out¬ 
performed manually designed published algorithms. 
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Abstract 

Written responses can provide a wealth of data in understand¬ 
ing student reasoning on a topic. Yet they are time- and labor- 
intensive to score, requiring many instructors to forego them 
except as limited parts of summative assessments at the end of 
a unit or course. Recent developments in Machine Learning 
(ML) have produced computational methods of scoring writ¬ 
ten responses for the presence or absence of specific concepts. 
Here, we compare the scores from one particular ML program 
- EvoGrader - to human scoring of responses to structurally - 
and content-similar questions that are distinct from the ones 
the program was trained on. We find that there is substan¬ 
tial inter-rater reliability between the human and ML scoring. 
However, sufficient systematic differences remain between 
the human and ML scoring that we advise only using the ML 
scoring for formative, rather than summative, assessment of 
student reasoning. 

Background 

The central importance of evolution to teaching and learn¬ 
ing in the biological sciences has been clearly established 
in all science education reform (States, 1900; Brewer and 
Smith, 2011). Adequate formative assessment instruments - 
administered during the course of instruction to gauge stu¬ 
dent understanding and reasoning in order to provide feed¬ 
back for future instruction, instead of to assign a grade at the 
end of a unit - that measure student understanding of evo¬ 
lutionary concepts (Bishop and Anderson, 1990; Anderson 
et al., 2002), however, have until recently been rather lim¬ 
ited (Nehm and Schonfeld, 2008). Part of the challenge in 
designing an effective instrument comes from the fact that 
student understanding of evolutionary concepts is complex, 
and constantly changing. Studies find that students hold 


both scientifically accurate and naive or non-scientific ex¬ 
planations simultaneously (Andrews et al., 2012; Hiatt et al., 
2013) and that accurately identifying alternative conceptions 
can be difficult (Rector et al., 2012). Data also suggest stu¬ 
dents reason differently than experts, especially in response 
to different contextual elements of the sample questions. 
Undergraduates employ more naive concepts when apply¬ 
ing explanations of natural selection to plants as compared 
to animals; trait loss as compared to trait gain; and unfamil¬ 
iar taxa as compared to familiar taxa (Nehm and Ha, 2011). 
Furthermore, ascertaining the meaning of student responses 
is often very difficult. One study found that 81 percent of 
students incorporated lexically ambiguous language in their 
responses to open ended questions about evolutionary mech¬ 
anisms (Rector et al., 2012). 

Despite these challenges, assessing student knowledge is 
important, particularly in evaluating pedagogical practices 
designed to improve student understanding. In an effort to 
identify effective assessment strategies, we have been inves¬ 
tigating the applicability of a new tool, EvoGrader (Mohar- 
reri et al., 2014). 

Open-ended student responses can provide a wealth of 
data about student reasoning. Unfortunately, they can also 
be time- and labor-intensive to score. One study found that 
it took an average of four minutes for a human grader to 
score a single response for the nine ideas we analyze in 
this study (Moharreri et al., 2014). For even a class of 30 
students, scoring five such questions would take ten hours, 
which quickly becomes prohibitive. If an instructor wants to 
get a general sense of student understanding on a formative 
assessment, a more rapid method is highly desirable. 
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An appealing potential solution to this problem would 
be if instructors had an automated system that was suffi¬ 
ciently sophisticated to evaluate student answers to such 
open-ended questions. Of course, this is not a simple task. 
Even setting aside the difficulty of parsing open-ended nat¬ 
ural language responses in general, one still has the further 
problem of interpreting the appropriateness of answers in re¬ 
lation to content knowledge and overarching concepts. For 
instance, a science teacher may want to know whether a 
student’s response demonstrates incorrect naive notions or 
whether it demonstrates concrete scientific understanding. 
Machine Learning systems have begun taking the first steps 
to accomplishing this difficult task. 

Use of Machine Learning in Education 

There is growing interest in using tools and techniques from 
Machine Learning in the classroom environment (Butler 
et al., 2014). In fact, a chapter has been written about using 
Machine Learning in educational science (Kidzinski et al., 
2016) within the context of a book on educational technolo¬ 
gies. One area of particular interest is language processing. 
Machine learning techniques have been used to classify in¬ 
structor questions according to Bloom’s taxonomy (Yahya 
et al., 2013). Perhaps the biggest use of Machine Learning 
in an educational environment is in the automated scoring of 
student writing (reviewed in (Nehm et al., 2012b)). 

One domain-specific example of ML techniques in lan¬ 
guage processing is provided by the web portal EvoGrader, 
discussed below. EvoGrader was designed to assess student 
understanding of natural selection, using a particular set of 
questions, consisting of a brief scenario and asking the stu¬ 
dents how a biologist would explain this scenario of evo¬ 
lutionary change or patterns. Our study seeks to measure 
how similar of scores this ML procedure provides to human 
scoring for questions on which the application has not been 
trained but which are written in the same style. 

EvoGrader 

EvoGrader (http : //www. evograder . org) is a free, 
online service that analyzes open-ended responses to ques¬ 
tions about evolution and natural selection, and provides 
users with formative assessments. It is described in detail 
in (Moharreri et al., 2014), but a brief description follows. 

EvoGrader works by supervised machine learning. Par¬ 
ticipants (n=2,978) wrote responses to ACORNS assessment 
items (Nehm et al., 2012a) and ACORNS-like items (Bishop 
and Anderson, 1990), generating 10,270 student responses. 
These items consist of a prompt describing a short scenario 
relevant to natural selection, and ask students to write how a 
biologist would explain this situation. Participants spanned 
many different levels of expertise, including non-majors, un¬ 
dergraduate biology or anthropology majors, graduate stu¬ 
dents, postdocs, and faculty in evolutionary science. Each 
response was scored independently by two human raters for 


each of six Key Concepts (KC) and three Naive Ideas (NI) 
(see Box 1). These consensus scores were used to train Evo- 
Grader, based on the supervised machine learning tools of 
LightSIDE (Mayfield and Rose, 2013). LightSIDE provides 
feature extraction, model construction, and model valida¬ 
tion, based on the human-scored responses. 

EvoGrader’s authors chose different methods to optimize 
the scoring algorithm for feature extraction for the 9 scor¬ 
ing models (one model for each concept) - all considered 
the dictionary of words used in a particular response, and 
reduced words to their stems; most removed high frequency 
low information words (e.g., the, of, and, it); some also in¬ 
cluded pairs of consecutive words (e.g., ’’had to”, ’’passing 
on”), or removing misclassified data (see Moharreri et. al. 
(Moharreri et al., 2014) Table 2 for details). 

After feature extraction, each response was converted to a 
set of vectors containing frequencies of words or word pairs. 
These vectors were then passed to a binary classifier, which 
underwent Sequential Minimal Optimization (SMO) (Platt 
1999) for each of the 9 models. The SMO training algo¬ 
rithm iteratively assigned weights to words in the written re¬ 
sponses until the model was able to match the human scores 
within a certain margin of error. The models were then val¬ 
idated with 10-fold cross-validation, using 90% of the data 
to generate a model and the remaining 10% of the data to 
validate it, and then repeating this procedure for a total of 10 
times such that each 10% of the data was used for validation 
exactly once and model generation 9 times. The authors av¬ 
eraged these models to get the final models used by the pro¬ 
gram, assessing whether they met quality benchmarks (90% 
accuracy and kappa coefficients > 0.8) defined by the cre¬ 
ators, and adjusting the training until the models did. 

EvoGrader uses these validated models to score new re¬ 
sponses from web users. Users must upload data in a spe¬ 
cific format, which the portal verifies. If the data is format¬ 
ted correctly, EvoGrader then evaluates each response using 
the existing validated models, and provides both machine 
scored data in a downloadable .csv format and a variety of 
web visualizations of the data. (Fig. 1) 

Methods 

Student data 

We administered pre-instruction and post-instruction tests 
consisting of two questions (see Box 1) about evolution 
to students in an Introductory Cell and Molecular Biology 
course in the fall semester of 2014. Both questions asked 
students about how evolutionary processes occur. Question 
1 asks about an evolutionary gain of antibiotic resistance in 
a population, while question 2 asks about the evolutionary 
loss of toxicity in a mushroom population. Completed pre- 
and post-test responses were obtained from 34 students for 
question 1 and from 36 students for question 2. 
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Figure 1: Concept maps produced by EvoGrader for the pre¬ 
instruction (upper panel) and post-instruction (lower panel) 
analysis of Question 1 (see Box 1). Sizes of the circles 
indicate percentage of responses scored as containing that 
concept; widths of the lines connecting concepts shows fre¬ 
quency of co-occurrence of those concepts. 


Box 1 

We evaluated student responses to two prompts: 

Question 1: Explain how a microbial population evolves 
resistance to the effects of an antibiotic. 

Question 2: A species of mushroom contains a chemical 
that is toxic to mammals. How would biologists explain the 
initial occurrence and increase in frequency of a number 
of individuals in the population that no longer produce this 
toxin ? 

We scored each response for whether it contained each of 
the following concepts: 


Key Concepts: 

• Variation'. The presence and causes (muta¬ 
tion/recombination/sex) of differences among individuals 
in a population. 

• Heritability: Traits that have a genetic basis and are able 
to be passed on from parent to offspring. 

• Competition'. A situation in which two or more individ¬ 
uals struggle to get resources which are not available to 
everyone. 

• Limited Resources : Required resources for survival (food, 
mates, water, etc) which are not available in unlimited 
amounts. 

• Differential Survival: Differential survival and/or repro¬ 
duction of individuals. 

• Non-adaptive Ideas: Genetic drift and related non- 
adaptive factors contributing to evolutionary change. 

Naive Ideas: 

• Adapt: Organisms/populations adjust or acclimate to their 
environment. 

• Need: Organisms gain traits or advantage in response to a 
need or a goal to accomplish something. 

• Use/Disuse: Traits are lost or gained due to use or disuse 
of traits. 

Further, human evaluators determined whether or not a 
response answered the question asked; if the response did 
not, no credit was given for Key Concepts. For example, 
consider this student response: 

Similar to above, some kind of mutation for the poi¬ 
son and those plants were not eaten so they were able 
to reproduce and pass thoses [sic] genes on to future 
generations. The population of poisonous mushrooms 
would soon outnumber non-poisonous ones since poi¬ 
sonous mushrooms are less likely to be eaten. Over 
time, animals would learn to stay away from teh [sic] 
mushroom simply be [sic] appearance, so the toxin 
would no longer be needed. 

Although this answer demonstrates adaptive reasoning 
about the origin of toxic mushrooms, the question was about 
the loss of toxin in this population, not the origin of the 
toxin. Only the last sentence addresses the loss of the toxin, 
and it does not demonstrate any of the Key Concepts. 
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Data 

Data files containing all student responses, scoring, and 
data analysis may be found at https : //github .com/ 
mjwiser/ALife2016 

Scoring responses 

We used EvoGrader to score student responses on two open- 
ended questions about natural selection for six Key Concepts 
and three Naive Ideas (see Box 1). Two human graders 
(MJW and LSM) scored student responses for these same 
criteria. We resolved any disagreement among the humans 
by discussion, resulting in a consensus human score. 

Statistical analysis 

We measured inter-rater reliability (IRR) between the Evo¬ 
Grader scores and the consensus human scores for each 
question, as outlined in (Hallgren, 2012). Because we were 
interested in the IRR of specific questions, we combined 
both pre-and post-instruction responses into a combined data 
set. We computed IRR both for each question as a whole, 
and separately for the key concepts and the naive ideas 
within each question. We chose to not compute IRR for 
each individual concept, or separately for pre- and post¬ 
instruction questions, because of the lower statistical power 
from examining each set separately, and the increase in mul¬ 
tiple comparisons this would necessitate. We also compared 
the EvoGrader and human consensus scores by way of 2- 
tailed paired t-tests to test for differences in the number of 
key concepts or naive ideas scored. We conducted all statis¬ 
tical testing in R version 3.2.3 (R Core Team, 2013). 

Results and Discussion 

The Inter-Rater Reliability (IRR) of EvoGrader and the con¬ 
sensus human scoring of these questions is good, with values 
of 0.63 for the antibiotic resistance question and 0.55 for the 
mushroom question (Fig. 2). This means that more than 
half of the total variance in scoring across these 9 concepts 
is shared among the raters. Landis and Koch (1977) suggest 
that IRR values from Cohens kappa in the range of 0.6 to 0.8 
indicate substantial agreement among coders, and values be¬ 
tween 0.4 and 0.6 indicate moderate agreement (Landis and 
Koch, 1977). By these criteria, when all of the concepts 
are analyzed together, the IRR for the antibiotic question is 
strong, and the IRR for the mushroom question is moderate. 

We further examined IRR separately for Key Concepts 
and Naive Ideas (Fig. 3), to examine whether there was 
a systematic difference between the two concept types. In 
the antibiotic resistance question, the IRR is notably higher 
for the Key Concepts than the Naive Ideas (0.63 v 0.17). In 
fact, the 95% confidence interval for the Naive Ideas IRR 
overlaps 0, meaning that the IRR is not statistically signifi¬ 
cantly different from ratings being assigned at random. Con¬ 
versely, IRR in the mushroom question is consistent across 


the Key Concepts and Naive Ideas (0.51 and 0.55, respec¬ 
tively), showing no meaningful difference across concept 
type. 

What can account for these differences in IRR? One thing 
to take note of is that when there is very low variation in a 
given raters scoring across responses, there is very little sta¬ 
tistical power to detect shared variance across raters. As a 
thought experiment, imagine that two different raters assign 
scores of Yes to 10% of responses, and No to 90%. Even 
if the two raters both assigned their scores randomly, the 
two raters would be expected to agree 82% of the time. IRR 
analyses take into account the expected frequency of scoring 
agreement, but a low variance across responses for a given 
rater will negatively affect the statistical power of IRR anal¬ 
yses. This is reflected in the wide confidence intervals for 
the Naive Ideas in particular. For one, there are fewer poten¬ 
tial Naive Ideas scored (since there are at most three Naive 
Ideas per student response, while at most six Key Concepts 
per student response). This skew in responses had a larger 
impact on the Naive Ideas in the antibiotic resistance ques¬ 
tion than elsewhere; EvoGrader only scored the entire class 
as expressing five total Naive Ideas in the antibiotic ques¬ 
tion; the consensus human score was 90. This is part of 
a general trend: for both questions, the human consensus 
score differed from the EvoGrader score, and by a statisti¬ 
cally significant margin even when correcting for multiple 
comparisons (see Table 1; all adjusted p-values <0.05). For 
both questions, the human consensus score detected more 
Naive Ideas than EvoGrader did. However, the humans de¬ 
tected more Key Concepts than EvoGrader did for the an¬ 
tibiotic question (question 1), but fewer in the mushroom 
question (question 2). 

Several factors may serve to lower the IRR from ideal 
levels. One obvious cause is mentioned in Box 1: some 
student responses demonstrate reasoning about natural se¬ 
lection, but do not answer the question asked. In these 
cases, the humans did not credit the student with any of the 
Key Concepts that did not address the question asked. Evo¬ 
Grader, on the other hand, did not have this screening mech¬ 
anism. Further, we analyzed both pre- and post-instruction 
responses jointly, and we expect the number of Naive Ideas 
expressed to decrease through instruction while we expect 
the number of Key Concepts expressed to increase through 
instruction. Such instructional effects would be a positive 
outcome for students, but both may reduce variance in the 
post-instructional scoring, reducing the statistical power to 
detect shared variance. 

What can account for the difference in results between the 
two questions? There are two potentially salient contextual 
differences between the questions. One, the first question is 
a gain of a trait, while the second is a loss of a trait. Two, 
the two questions use different taxonomic groups as their 
examples. Both of these differences have been shown in the 
literature to be important to student reasoning (Nehm and 
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Figure 2: Inter-rater Reliability for Questions 1 and 2. Key Concepts and Naive Ideas are pooled within each question. Plotted 
values are Cohen’s kappa. Error bars shown are 95% confidence intervals. 


Comparison 

t 

df 

P 

adj. p 

Antibiotic KC 

5.779 

67 

2.14 * 10 -7 

8.58 * 10 -7 

Antibiotic NI 

2.604 

67 

0.0113 

0.0453 

Mushroom KC 

-2.806 

71 

0.00647 

0.0259 

Mushroom NI 

3.384 

71 

0.00117 

0.00466 


Table 1: 2-tailed paired t-tests comparing EvoGrader and human consensus scoring of Key Concepts (KC) and Naive Ideas 
(NI). Negative values indicate more of these concepts detected by EvoGrader; positive values indicate more of these concepts 
detected by humans. A Bonferroni correction was used to generate the adj. p values. 


Ha, 2011). In a future study, we will be able to disentangle 
these factors through a multifactorial design that considers 
multiple taxonomic groups and asks both a gain of trait and 
a loss of trait question within each. 


Conclusions 

EvoGrader is a useful tool for assessing student reasoning 
about natural selection. Even on questions not included in 
the training, it provides a reasonable level of reliability in 
scoring student responses on open-ended questions of a sim¬ 
ilar style to the ACORNS assessment. However, it is not 
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Figure 3: Inter-rater Reliability for Questions 1 and 2, broken down between Key Concepts (KC) and Naive Ideas (Nl). Plotted 
values are Cohen’s kappa. Error bars shown are 95% confidence intervals. 


foolproof. In our study, EvoGrader credited students as dis¬ 
playing more Key Concepts, and fewer Naive Ideas, than 
our human raters did. In particular, EvoGrader may inaccu¬ 
rately credit student responses that do not address the spe¬ 
cific question asked for evolutionary reasoning. For forma¬ 
tive assessments, it can be a valuable tool to get a sense of 
student responses in a short period of time, but we caution 
against using EvoGrader to assign points to students, given 
its current limitations. 
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Abstract 

Embodied Evolution (EE) is an evolutionary strategy based on 
natural evolution in which the individuals that make up the 
population are embodied and situated in an environment where 
they interact in a local, decentralized and asynchronous fashion. 

It has been successfully applied in collective problems showing 
its validity to perform on-line evolution both in simulated and 
real agents. A key feature of EE is that of emergent 
specialization, that is, this strategy is able to autonomously 
generate a distribution of individuals into species if that is 
advantageous in the scenario. This paper goes in the line of 
studying such feature in more depth, analyzing how the 
complexity of the task (fitness landscape) and the complexity of 
the individuals (control system) affect the emergence of 
specialization. The analysis is carried out using a canonical EE 
algorithm in a real problem consisting in a collective 
surveillance task with simulated Micro Aerial Vehicles. 

Introduction 

As research has advanced in the field of evolutionary 
optimization, new approaches and techniques have allowed 
researchers to address even more complex problems. In this 
sense, a remarkable challenge is that of solving real-world 
dynamic problems in the absence of centralized and updated 
information. This type of problem appears in tasks like 
routing, surveillance, resource assignment, etc. 

Some of the most successful approaches in this line are 
based on multi-agent systems that exploit the coordination 
between the agents to provide a collective solution to the 
problem (Hanna, 2009) (Rinde, 2012). This way, each agent is 
assumed to have decentralized and out-of-date information 
and, with it, it must handle its small part of the problem in 
real-time. The effort of the evolutionary algorithm is on 
finding a global coordinated solution to the problem as an 
aggregation of the partial ones, which can be really complex 
in dynamic environments. 

With the aim of dealing with a more flexible and general 
search process, some authors have included the behavioral 
specialization of the agents as a new dimension of the 
collective problem. That is, the agents can be heterogeneous 
in operation, so specialists can emerge from evolution if they 
are beneficial for the task. It could seem that this new 
dimension increases the complexity of the search space, but it 
has been shown that allowing such flexibility simpler agents 
can emerge, which leads to a general simplification of the 
problem. It must be pointed out here that in this type of 


heterogeneous collective evolution, the solution to the 
problem is provided by the concurrent execution of the whole 
population, and not by a replication of the best individual. 

The most remarkable evolutionary strategies one can find 
in heterogeneous collective optimization are Cooperative 
Coevolution Evolutionary Algorithms (CCEA) (Wiegand, 
2003) (Panait, 2010). In this type of algorithms, the control 
system of each agent, typically an Artificial Neural Network, 
is evolved in an independent population, although the fitness 
of each individual is obtained by their joint execution with 
their team. Thus, if the solution is made up of n components, a 
CCEA evolves n populations, each one containing the 
genotype that will define the response of each component. 
Specific types of CCEAs like SANE (Gomez, 1997), Multi¬ 
agent ESP (Yong, 1999), CONE (Nitschke, 2012) or Hyb- 
CCEA (Gomes, 2015) have been widely applied with success 
in collective optimization of problems. The main problem of 
CCEAs for solving real-world dynamic problems is that they 
must run off-line due to their high computational 
requirements. Although the evolutionary processes can be 
executed in parallel, they must be serialized for evaluation, 
which must be performed several times in order to provide 
enough combinations of genotypes to achieve a reliable 
evaluation. 

An online version of CCEA is the evolutionary strategy 
known as Embodied Evolution (EE). It was created by Ficici 
and Watson in 1999, inspired by Artificial Life experiments, 
with the aim of speeding up online evolution (Ficici, 1999). 
The main difference with traditional CCEAs is that evolution 
is at the population level in EE, that is, each individual only 
carries its own genotype. Consequently, EE follows a natural 
evolution scheme in which the individuals that make up the 
population are embodied and situated in an environment 
where they interact in a local, decentralized and asynchronous 
fashion. This interaction is not driven by a preset 
synchronization mechanism as in traditional evolutionary 
algorithms, but by the result of the particular behaviors of the 
active individuals in the environment and their interactions 
with other active and passive elements within it (Schut, 2009). 
Evolution in EE is open-ended, leading to a paradigm that is 
intrinsically adaptive and highly suitable for real time learning 
in distributed dynamic problems. 

In the last decade, EE interest has grown mainly in the field 
of multi-robot systems (Bredeche, 2012) (Elfwing, 2011) 
(Eiben, 2010). As a consequence, different algorithms have 
arisen, which share the same operational principles but differ 
in how they implement specific operators. During subsequent 
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years, some of those algorithms were successfully applied to 
different collective problems (Bredeche, 2012) (Elfwing, 
2011) (Prieto, 2010) (Duro, 2011), which allowed the 
validation of the paradigm in practical terms, but also the lack 
of a formal characterization to be considered by researchers in 
the evolutionary computation field became clear. As a first 
step towards this standardization of EE, a canonical EE 
algorithm that isolates its operational principles from those of 
the particular implementations was presented and thoroughly 
studied in (Prieto, 2015). 

The current paper follows the line of EE formal 
characterization. In this case we are interested in analyzing in 
depth one of the main features of EE: the emergence of 
specialists (Nitchske, 2008). As shown in (Prieto, 2015) and 
(Trueba, 2013), in EE the optimal number of species emerge 
as required by the problem. But, what features of the problem 
determine the emergence of specialists? Can we anticipate the 
species that will arise or understand the reason why a given 
organization has emerged? These two questions are very 
relevant when developing an optimization algorithm. It is 
obvious that the spatial or temporal separation between 
individuals promotes specialization because it avoids mating 
(Trueba, 2013), but there are several more features. As it is 
well-known in fields like Complex Systems (Mitchell, 2008) 
and Ecology (Epstein, 1996), one of the most relevant factors 
is the complexity of the environment (Bonabeau, 1997) 
(Burbeck, 2007) and the complexity of the individuals 
(Anderson, 2001) (Detrain, 2002) that make up the 
population. 

Thus, the question we aim to address here is how 
complexity affects the emergence of specialists in EE. We 
have analyzed this problem from two perspectives: varying 
the complexity of the environment where the agents are 
situated and varying the complexity of the agent itself, that is, 
its optimization capability. To carry out this analysis, we have 
applied the canonical EE algorithm in a collective surveillance 
task with simulated Micro Aerial Vehicles (MAVs), which is 
a prototypic example of dynamic and decentralized problem 
that must be solved in real time. 


Canonical Embodied Evolution Algorithm 

The canonical EE algorithm is explained with detail in (Prieto, 
2015), and here we will provide just a brief summary of its 
main parameters and operation. During its development we 
decided to simplify and extract the barebones specification of 
a generalized EE algorithm in terms of as few parameters as 
possible. This generalization has been achieved by 
substituting the activation of particular operators triggered by 
events produced in the real problem with probability 
distributions, which are not task dependent. A second 
objective in the definition of a canonical EE algorithm was to 
do away with the bonds the environment imposes on the 
structure and operation of this type of situated algorithm. The 
adaptation to the environment determines the type of tasks 
that are considered tractable. It also makes the algorithm and 
its behavior even more task dependent, and as a result, it 
becomes more complicated to extract general conclusions 
from experiments. Consequently, the circumstantial/spatial 
interactions have been replaced in the canonical algorithm by 


stochastic variables, which follow probability functions. The 
canonical EE pseudo-code is the following: 

1 While simulation active do 

2 Random creation of population 

3 For each interaction 

4 For each individual 

5 Assign fitness - (Scenario interaction) 

6 If random < P ma ting 

7 Look for fertile partners (Pselection) 

8 Select a partner for recombination (P eiegmuty) 

9 Generate offspring and store it (P !s ) 

10 If random < P replacement 

11 New individual current offspring genotype 

11 Reset individual 

13 End 

The canonical EE performs three basic processes: 
evaluation (line 5 in the pseudo-code), mating (lines 6 to 9) 
and replacement (lines 10 to 12). The processes and 
parameters that define the algorithm response are: 

- Mating selection : it has been modeled as an event that 
is triggered by a uniform probability function that 
depends on a single parameter, the probability of 
mating for every time step (- Pmating )• This probability 
can be calculated based on the maximum number of 
mating one individual can perform, which can be 
assimilated to a “maximum tournament window size” 
(, S max ), and on the maximum lifetime of an individual 

( T max ), which is the same for every individual: 

c 

p _ °max 

* mating ~ ^ 

l max 

- Selection policy: the probability of being eligible as a 
candidate for mating (P e iegibmty) is defined through a 
function ( ifi e ) that is based on three different criteria: 
the genotypic distance (<; Pcand )> a distance measure 
between certain status parameters ( P ca nd ) which vary 
during evaluation time due to both the phenotype of 
each individual and environmental circumstances (for 
example, position in a geometric space) and the 
fitness value (y cand ): 

T eligibility ~ cand>Ycand> \PcandX) 

- Genotypic recombination : as in natural evolution, two 
main recombination operators can be distinguished: 
mutation and crossover. In order to characterize this 
genotypic recombination in a general and simple way, 
a new intrinsic parameter is defined: the probability of 
using a local search strategy (Pi s ), that is, a mutation 
operator. It is a measure of the exploration and 
exploitation balance through the ratio between 
crossover and mutation frequency. 

- Replacement : the replacement process is modeled 
here as triggered by a replacement probability 
(Preplacement) and it is defined based on a more intuitive 
and manageable parameter, which is the life 
expectancy (T exp ): 
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1 

Preplacement ~ ^ 

1 exp 

The life expectancy is defined for each individual in each 
time step based on its current fitness (Q0, which depends on 
its genotype and the genotypes of the others. Specifically, a 
piecewise function has been defined to assign the life 
expectancy (T exp ) for any fitness value using a linear model. 
This function has two pieces, one for individuals with low 
fitness (Qi < 2/3Q max ) and another for individuals with high 
fitness (Qi > 2/3Q max ): 


3. Mediocrity coefficient (C m ): it establishes the expected 
lifetime for the individuals with a relative fitness value 
equal to 2/3 (Q r = Qi/Qmax = 2/3) of the maximum current 
fitness (it can be seen as the expected lifetime of the 
mediocre individuals). This parameter ranges from 0 
(meaning their life expectancy will equate T mat ) to 1 
(meaning it will equate T max ). 

4. The maximum lifetime (T max ): it is the maximum number 
of time steps a chromosome can participate in the 
evolution. It is assigned to individuals with their fitness 
equal to the maximum current fitness. 
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Fig. 1. Graphical representation of the function that 
parameterizes replacement 

The use of this function to model replacement allows 
covering a broad range of different replacement policies, 
ranging from those with synchronous replacement (T mat = T max 
and C m = 1) to those with a strongly exploitative operation 
through elitism (T mat = 1 and C m = 0) by way of any 
intermediate combination. As it can be observed in figure 1, 
the relation between fitness and life expectancy has been 
modeled by means of four fixed parameters: 

1. Maturation time (T mat ): it is defined here as a percentage 
of T max and it sets the minimum T exp for an individual 
and, as a consequence it captures the sum of two time 
periods, the minimum time required to reliably evaluate 
an individual (1 in our case) and the life-time for those 
with zero fitness. During the maturation time the 
individual cannot be replaced. 

2. Maximum current fitness (Q max )- since the maximum 
fitness one individual can achieve for a specific scenario 
is not known beforehand, this value represents the 
maximum known fitness so far, and it is used to calculate 
a relative fitness Qr = Qi/Q max . 


As a summary, six intrinsic parameters and an eligibility 
function have been defined to encompass and generalize the 
operation of a general EE algorithm. T max , S max , Pj s , Q max > T ma t, 
C m and \|/ e . See (Prieto, 2015) for a sensitivity analysis of the 
canonical EE algorithm. 

To apply the canonical EE algorithm to real problems, two 
operators have to be adapted to the constraints the scenario 
imposes, and therefore, their dependence on the task is 
unavoidable: the mating operator, which is constrained by the 
communication limitations of the scenario and individuals to 
exchange their genetic codes, and the evaluation operator, 
which relies on the actual behavior and on the state of the 
scenario. 


Collective Surveillance Task 

To analyze how complexity affects the emergence of species 
using the canonical EE, we have designed a simulated 
environment in which a fleet of Micro Aerial Vehicles 
(MAVs) has to collectively survey an indoor scenario where 
there is not centralized information available. To do it 
properly, the MAVs need to locate themselves to keep track of 
their trajectories and to share this information with other 
robots. The determination of their positions will be performed 
using their IMU, artificial landmarks that can be sensed using 
the onboard camera, and the position of other MAVs in sight. 
The control of each of the MAVs is provided by an Artificial 
Neural Network (ANN), and the parameters of this ANN will 
be adjusted using the canonical EE algorithm. Thus, EE is in 
charge of organizing the MAVs in the scenario in order to 
increase the accuracy of the fleet location, and consequently, 
the speed at which a new point of interest is reached. 

Experiment description 

The experimental setup has been defined in simulation, based 
on a real indoor gathering task performed by MAVs as a 
previous step to translate the algorithm or directly the 
controllers to a fleet of real MAVs. The specific MAV that 
has been modeled is the Parrot ARDrone 2.0, a very popular 
general-purpose commercial quadcopter. Indoor navigation is 
performed then by simulated ARDrones that, as in the case of 
the real ones, are provided with an IMU and a camera to 
perform autonomous positioning. The most important aspect 
of the simulation is the model of the response of the location 
sensors when a certain maneuver is carried out. Firstly, the 
IMU will provide some velocity signals for each degree of 
freedom, which has to be integrated to produce the estimated 
motion. The estimation of the velocity is modeled as subject 
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to a normal distribution centered on the real input velocity 
with its corresponding variance matrix as is frequently 
assumed on real navigation. 

The model implemented for the artificial markers 
represents the use of the AprilTags created by The APRIL 
Robotics Laboratory at the University of Michigan (Olson, 
2011) since it is also what we have used during our tests with 
the real ARDrones. Therefore, the location estimation 
provided by the markers is based on a real accuracy model 
which was produced in our laboratory using an ARDrone 2.0 
and 40 cm long AprilTags, and that can be formulated as a 
function L e which relates the variance of the estimation 
Var(p drone ) with the relative distance (||p drone ~Ptag\\) and 
orientation ( yaw drone . tag ) between camera and tag: 

Var(p d 

rone) I'eCilPdrone Ptag\\> y^^drone:tag) 

These tags {permanent tags) provide an absolute and 
potentially accurate position estimation, which does not 
degrade with time. In order to improve the performance of the 
navigation by improving the accuracy of the MAVs the same 
type of tags are attached to the body of the quadcopters 
{mobile tags), which will make up a hybrid location sensor. 
Therefore, the detection provided by a mobile tag still 
constitutes a direct location estimation but, unlike in the case 
of permanent tags, their accuracy is variable and it is modelled 
as proportional to the velocity of the MAV’s. As a 
consequence, those MAVs which will serve as mobile tags 
will have to stay still to be able to provide location accuracy. 
The use of this type of mobile tags leads to accuracy 
becoming a resource that MAVs can get and share to be able 
to slow down its degradation, and therefore, to accomplish 
their main task more efficiently. It could also allow some of 
the MAVs not to require visits to static tags, which are 
frequently non optimally located, in order to improve their 
location accuracy, being ‘nourished’ by mobile tags. 
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Fig. 2. Graphical representation of a portion of the scenario. 
The distance to each own shadow represents the current 
estimation error for the MAV. The red MAV represents a 
MAV acting as a mobile tag. The circles around the tags 
represent the different levels of accuracy provided by the tag. 


Side of the arena (L) 

768 

Total area 

L 2 

Fixed tags detection range 

L/4 

Mobile tags detection range 

L/16 

Max velocity (V m ax) 

L/50 

Standard deviation for the velocity 

V max /50 

Max accuracy provided by an April Tag 

L/10 


Table 1. Design parameters of the simulated scenario 


The scenario was discretized to reduce computational effort 
as a 768 x 768 (square length units) non-toroidal square arena, 
which is provided with 4 fixed tags placed randomly. Figure 2 
shows a schematic representation of a portion of the arena. 
Each MAV is associated to both a real and an estimated 
position. The former is shown with a solid color in the 
simulation and the later with a softened shadow of the MAV. 
The MAV has no idea of the real position, this is just an 
externally obtained value for display purposes. The further the 
distance between those positions, the higher the location 
estimation error and the lower the exploration level. Table 1 
contains the specific parameters that define the scenario. 

The final objective of the surveillance task is for the fleet of 
MAVs is to continuously cover the maximum possible area. 
In order to perform the search of unexplored areas, the MAVs 
can keep record of the areas they have already explored. This 
is stored in an ‘exploration map’ carried by each MAV and 
that can also be shared with others when they meet. The 
exchange of this information allows the task to be solved 
cooperatively since it enables the distribution of the search 
among the group of MAVs. However, since the estimation of 
one’s position has a varying accuracy, the updating of the 
exploration map has to take that into account. The exploration 
of a cell is modeled as an exploration probability {P ex ), which 
indicates the probability that a certain cell has of having been 
explored. This probability {P ex ) can be directly calculated as 
the ratio between the size of a cell ( L ceU ) and the location 
error range {E ioc ) and is stored in the exploration map: 


I 2 

L cell 

p 2 

ft ' Eloc 


Subsequent MAVs must decide whether or not to re¬ 
explore that cell based on the guaranteed exploration 
probability P ex . Therefore, the collaborative exploration map 
is the only information that a MAV gets from the scenario 
about the surveillance process. The individual fitness for each 
agent increases each time it covers and unexplored cell. The 
global fitness for the multi-agent system, which must be 
optimized by the canonical EE algorithm, is the sum of 
exploration probability for all the cells j in the scenario: 

G = y y, P exj 

j 

It must be highlighted that the MAVs are not transparent to 
each other and can collide with others and with the walls. 
When a MAV collides, it loses part of its location accuracy. 
To avoid collisions, they are provided with an obstacle sensor 
which mimics an infrared ring that detects near obstacles. 
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Individual encoding 

The control architecture for each agent is a multilayer 
perceptron ANN with three layers: one input layer with three 
neurons, one hidden layer with a configurable number of 
neurons and one output layer with one neuron. The first input 
is the current exploration capability of the agent, which 
measures the exploration capability and it is based on its 
location error. The second input provides the maximum 
attainable exploration in the surrounding areas and the 
direction towards it. The third input provides the distance to 
the closest obstacle (up to the sensing range of the sensor) and 
the direction towards it. The size of the intermediate layer will 
be varied for different experimental configurations to analyze 
the complexity of the behavior of the agent, from 1 neuron in 
the simplest case to 10 neurons in the one with highest 
complexity. The output of the neural network modulates the 
behavior of the MAV between four pre-learned basic 
behaviors, namely: move towards the most unexplored area 
detected , move towards the nearest tag , move against the 
closest obstacle and stay still to serve as a dynamic tag. These 
behaviors are individually selected for each time step, the 
frequency with which each agent will display each behavior 
will determine the agent species. 


Iterations 

20.000 

Population size 

40 

Maximum lifetime (T max ) 

1000 

Maturity time (T mat ) 

25 

Selection criteria (\|/ e ) 

Higher fitness (y cand ) 

Tournament selection size (S max ) 

40 

Local search probability (Pi s ) 

1.0 

Mediocrity coefficient (C m ) 

1.0 

Maximum current fitness (Q ma x) 

Automatic 

Chromosome length 

{4,8,40} x [0,1] 


Table 2. Parameters of the canonical EE algorithm 


The implementation of the canonical EE algorithm used 
here follows the pseudocode presented in section 2, while the 
specific values for the parameters are those shown on Table 2. 
These values were selected according to the conclusions 
extracted from (Prieto, 2015). 

Results 

This section will show the results obtained for different 
configurations of the scenario in order test how different 
variations both in the complexity of the individuals and in the 
complexity of the environment affect the outcome of the 
evolution. The complexity of the individuals has been 
modified by changing the size of the ANN which controls the 
MAY. The complexity of the environment has been changed 
in two ways. First, by including more or less permanent tags, 
which will make mobile tags more necessary (or 
indispensable if there isn’t any permanent tag) as the number 
of permanent tags decreases. Second, the environment is also 
adjusted by modifying the impact of the collisions on the 
degradation of the accuracy of the MAVs, which will make 


their navigation more or less complex (from neglecting the 
obstacle sensor to rigorously avoiding obstacles). 

To clarify the canonical EE response in this type of 
collective optimization problem, we will describe first the 
emergence of specialization in a representative run performed 
using the following experimental configuration: no permanent 
tags, an ANN with 4 weights (3x1x1) and no penalty for 
collisions. The top plot of figure 3 shows the evolution of the 
global fitness of the population during 20000 iterations while 
the bottom one displays the average number of species per 
iteration (iV s ). This number of species was calculated using 
the metric defined by the Davies-Bouldin Index (DBI) 
(Davies, 1979) applied to a k-means clustering algorithm. 
Since that metric does not consider the existence of a single 
cluster, an auxiliary species of ten individuals is always 
included in the data to be clustered. The number of species 
(N s ) is obtained using the average of each possible number of 
clusters (from 1 to 10) weighted by the inverse of the DBI 
associated to them (all possible numbers of clusters), so that 
the strong discretization of the low number of species is 
avoided and the measurement is more accurate. To account 
for the inclusion of an auxiliary species the final result is 
obtained by subtracting one from the weighted average, that 
is: 


N' 


^ i * DBI(i)~ r 


with y ■■ 


The parameter y provides a smoothing coefficient, the 
lower the coefficient the more discrete the result. 

Global fitness over time 



iterations 


• nr 


Fig. 3. Evolution of the global performance of the population 
(top) and average number of species (bottom) during 20000 
evolution steps 
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As it can be observed in the top plot of figure 3, in this run, 
the population starts with low performance but it quickly 
achieves a successful global fitness level (less than 300 
iterations). As demonstrated in several previous works (Prieto, 
2010) (Duro, 2011), a remarkable strength of EE is the 
achievement of satisfactory results after few time steps. The 
number of species also converges to a value around 2, which 
can be observed in the bottom plot of figure 3. 
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Fig. 4. Frequency of activation of each pre-learned behavior 
for the whole population (top) and individual fitness of each 
of the 40 individuals of the population (bottom) 

The top plot of figure 4 shows the frequency of activation 
of each pre-learned behavior for the whole population, and the 
bottom one shows the individual fitness of each of the 40 
individuals of the population. The coloring code is the same 
for all the plots in figure 4 and figure 5, which depicts 
individual parameters: red for those individuals with 
predominant become mobile tag behavior, green for exploring 
the surrounding area , blue for moving towards the closer tag 
and yellow for avoiding obstacles (although not present in this 
run since the penalty is set to zero). It can be seen in the top 
plot of figure 4 that up to iteration 8000, two main species 
have emerged: become mobile tag (red) and exploring the 
surrounding area (green). As shown in the bottom plot, being 
a mobile tag provides a higher individual fitness to the agents. 


After around 8000 iterations, a period starts in which the 
population gets destabilized, which is something that happens 
often in a co-evolutionary process, and the global performance 
drops quickly (see figure 3 top). This run was indeed selected 
to illustrate both the dynamism of this type of evolution and 
the capabilities of the algorithm to recover from this period to 
achieve again a successful performance. As shown in the top 
plot of figure 4, in this unstable period, the number of species 
decreases to almost one since the mobile tag species (red) gets 
extinguished. Moreover, there is an increment in the go to tag 
species seeking accuracy (blue), but it is useless since there 
are no available tags any more. A few iterations before 10000 
only explorers survived trying to use the little remaining 
accuracy to explore an environment which is becoming more 
and more uncovered. It takes the algorithm around 2000 more 
time steps of low quality individuals (all of them are 
condensed in a black line shown in the bottom part of the 
bottom plot of figure 4) to finally produce the two required 
species to successfully explore the environment and to rise the 
global fitness again to a successful level (become mobile tag 
and exploring the surrounding area). 

Figure 5 contains a representation of the genes of the 
population along this evolution. To do so, each genotype is 
projected into two independent dimensions. Those individuals 
that live less than 100 time steps are depicted in black 
meaning that there was no time to consistently assign a 
behavior to them. As it can be observed in figure 5, during the 
instability gap (iterations 8000 to 12000), the genomes of the 
population expand along the genetic state space, producing a 
higher number of ephemeral species that, just before the 
recovery (iteration 12000), decrease quickly to converge again 
to a configuration of two main species. 
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Fig. 5. Representation of the genes of the population along the 
evolution where each genotype has been projected into two 
independent dimensions. 

Figure 6 displays 6 plots where the correlation between 
performance and number of species has been studied for a set 
of configurations that vary the environment and individual 
complexities. The size of the neural network was set to 3x1x1 
(4 weights), 3x2x1 (8 weights) and 3x10x1 (40 weights) and it 
is displayed in the top, middle and bottom rows in figure 6. 
Moreover, the environment was set with and without 
permanent tags (left and right columns in figure 6), and with 
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and without obstacle penalties (red and blue lines in all the 
plots). The data displayed in this figure were obtained after 5 
runs of 20000 iterations each. 

The comparison shows several aspects of this interaction 
between complexity and the outcome of the co-evolutionary 
process. In the case of a scenario without permanent tags and 
without penalties for collisions (left column, blue lines), as 
expected, the simplest controller (4 weights) tends to produce 
two or three species to optimize its behavior (peak value of 
the blue curve shown in top-left plot). If the number of 
weights of the ANN is doubled (middle-left plot, blue line) 
then we can see that efficient solutions can be found with 
several configurations of species, with a slight tendency 
towards one species configurations, which indicates a greater 
robustness and versatility. However, if we set an ANN with 
up to 40 weights (bottom-left top, blue line), we observe a 
decrease in the overall performance and now the system tends 
to avoid only one species configurations. After studying 
different runs with this configuration (40 weights), it can be 
seen that it is harder for the algorithm to find a solution, and 
also to converge to only one species in such a high 
dimensional search space, and that some of the runs provide 
poor performance. 
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Fig. 6. Correlation between performance and number of 
species for a set of configurations that vary the environment 
and individual complexities. Top plots correspond to 3x1x1 (4 
weights) ANN, middle ones to 3x2x1 (8 weights) and bottom 
ones to 3x10x1 (40 weights). Left plots do not have 
permanent tags while right ones do. Finally, the red lines 
correspond to executions with obstacle penalties while the 
blue lines correspond to executions without them. 

When we activate the penalties for collisions (left plots, red 
lines), there is a higher complexity for navigation, and 
consequently the performance is much lower for 4 and 8 


weights (top and middle plots). However, in the case of 40 
weights it does not seem to affect greatly, which indicates 
that, although this configuration is harder to adjust, it is less 
affected by the complexity of the environment. On the other 
hand, when the environment becomes simpler because the 
permanent tags are included (right column plots), the 
evolution, in almost all the configurations, tends to create only 
one species which is able to perform successfully. Again, if 
the penalties are included (red lines), the performance 
decreases but the tendency regarding the species remains the 
same. Finally, the controller with 40 weights becomes, again, 
much less affected by the penalties and exhibits also again a 
tendency to avoid one species configurations. 

Summarizing, for this experimental setup an intermediate 
complexity of the ANN (8 weights) has shown to be 
beneficial both in terms of performance and in stability 
regarding number of possible species. Using complex 
individuals or very simple ones led to a more unstable 
response of the algorithm. In the case of the more complex 
controller it is more complicated to evolve, but it is less 
affected by the complexity of the scenario. Finally, simpler 
scenarios tend to create one species configurations for most of 
the controllers. 

In terms of the response of the algorithm, its capability to 
adapt the behavior and structure of the population to different 
individual and scenario configurations regarding their level of 
complexity has been shown. The algorithm is able to take 
advantage from the heterogeneity of the population when it is 
required to achieve a good global performance by making use 
of the specialization, as it has been constantly observed in 
natural evolution in real ecosystems (colonies of ants, termites 
or bees). It also tends to simplify the structure of the 
population when the scenario is simple in relation to the 
complexity of the individuals due to the evolutionary cost of 
adjusting a large genome. Interestingly, if the controller shows 
a high complexity the algorithm fails to produce an efficient 
population in some runs but when it doesn’t it is more robust 
to changes in the complexity of the task. 


Conclusions 


This paper has studied the impact of complexity in the 
emergence of collective solutions to distributed problems by 
means of an implementation of Embodied Evolution for a 
collective surveillance task with simulated Micro Aerial 
Vehicles. The algorithm used, canonical Embodied Evolution, 
has already been tested for several applications and has shown 
its capability to decompose a task into several subtasks and to 
improve performance by generating different species among 
the individuals of the population. This work has presented a 
first attempt to deepen in the mechanisms that affect the 
creation of those species when the whole population pursues a 
common goal. The study has shown a strong impact on the 
outcome of the algorithm of variations in the complexity of 
the definition of the problem at different levels, namely, the 
complexity of the scenario and the complexity of the 
individuals. In terms of the response of the algorithm, the 
algorithm has found efficient groups of controllers for 
different individual and scenario setups regarding their level 
of complexity. As in the case of natural evolution, which 
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embodied evolution approaches mimic, the algorithm works 
with, and is able to take advantage of, heterogeneous 
populations when they are required to achieve a good global 
performance by making use of specialization. Our ongoing 
work, based on the results presented in this paper, is focused 
on including the number of parameters of the controllers as an 
individual evolvable parameter. The main goal will not be to 
obtain an optimal value for the size of the neural network of 
the whole population, since this can already be analyzed from 
the current results, but to study the dynamics produced when 
populations can be heterogeneous in both individual behavior 
and complexity. 
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Abstract 

Novelty search is a powerful biologically motivated method 
for discovering successful behaviors especially in deceptive 
domains, like those in artificial life. This paper extends the 
biological motivation further by distributing novelty search 
to run in parallel in multiple islands, with periodic migration 
among them. In this manner, it is possible to scale novelty 
search to larger populations and more diverse runs, and also 
to harness available computing power better. A second exten¬ 
sion is to improve novelty search’s ability to solve practical 
problems by biasing the migration and elitism towards higher 
fitness. The resulting method, DANS, is shown to find bet¬ 
ter solutions much faster than pure single-population novelty 
search, making it a promising candidate for solving deceptive 
design problems in the real world. 

Introduction 

Novelty search is a new approach to population-based search 
motivated by the creativity and diversity of biological evo¬ 
lution (Stanley and Lehman, 2015; Lehman and Stanley, 
2010a). Instead of optimizing a fitness objective, novelty 
search maximizes phenotypical diversity. Individuals are 
seeking novel niches to fill, developing emergent problem¬ 
solving abilities in the process. Novelty search is partic¬ 
ularly powerful in domains that are deceptive, where it is 
necessary to discover low-fitness stepping stones first before 
actual solutions can be reached. Many tasks in artificial life 
are deceptive in this way, including behaviors that require 
developing learning, memory, and communication abilities 
(Lehman and Miikkulainen, 2014). Novelty search is thus 
a promising approach to constructing complex behavior in 
artificial life domains. 

This paper aims to improve novelty search as a general 
problem-solving method in two ways. First, a distributed 
version of novelty search is developed. The idea is that 
search progresses in parallel in separate islands, and the best 
(i.e. most novel) individuals are periodically exchanged be¬ 
tween them. Such a distribution is motivated by biological 
evolution, potentially leading to an implementation that can 
account for biological phenomena more accurately. It is also 
a well known diversity-maintenance technique in evolution¬ 
ary algorithms in general (Whitley et al., 1999). However, 


the main motivation in this paper is to make novelty search 
computationally more powerful. Distribution makes it pos¬ 
sible to scale novelty search to a much larger pool of indi¬ 
viduals, and to take advantage of diversity between differ¬ 
ent novelty search runs. It also makes it possible to har¬ 
ness available parallel computing resources to serve novelty 
search. It therefore makes it possible to use novelty search 
to solve harder problems faster. 

Second, a principled way of guiding novelty search to¬ 
wards high-fitness solutions is developed. Novelty is still 
the primary selection mechanism, but fitness is used to bias 
the process in two ways: (1) by migrating only the most fit 
individuals across the islands, and (2) by selecting the more 
fit of two similar individuals into the elitist pool. Such sub¬ 
tle biases do not prevent novelty search from creating and 
retaining novel individuals, but they make it more likely to 
create individuals that are solutions to the given problem. 

Together these two extensions result in a powerful version 
of novelty search that can be used to solve difficult design 
problems in the real world. As a demonstration, in this pa¬ 
per they were implemented in an existing distributed evolu¬ 
tion system called EC-Star (O’Reilly et al., 2013). This hub- 
and-spoke architecture manages a number of clients running 
separate evolutionary searches. A key feature of EC-Star is 
age-layering (Hodjat and Shahrzad, 2013): candidates are 
first evaluated in a small number of samples, and if they are 
promising, with more samples. Age-layering makes evolu¬ 
tion more efficient, decreasing run times an order of magni¬ 
tude or more (Shahrzad et al., 2016). It is also well-suited for 
evaluating novelty across many separate novelty searches. 

The resulting method, Distributed Age-Layered Novelty 
Search, or DANS, is demonstrated on two challenging engi¬ 
neering design tasks: the 11-bit multiplexer, and the eight- 
input sorting network. The results show that DANS can find 
better solutions much faster than single-population novelty 
search. It is therefore a promising artificial life approach for 
solving deceptive problems in the real world. 
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Background and Related Work 

The idea of divergent search, of which novelty search is an 
example, is first discussed, followed by existing work on 
combining novelty with fitness. The distributed evolution 
platform of EC-Star with age-layering, used to implement 
DANS, is then reviewed. 

Objective vs. Divergent Search 

In the traditional objective-based search, a population of 
candidate solutions are evolved to maximize a specific mea¬ 
sure, or objective, called a fitness function. Individuals are 
selected for reproduction, and offspring are accepted into 
the population, if they score high in that function. The idea 
is that evolution thus gradually discovers better and better 
individuals, until it finds some that optimize the chosen ob¬ 
jective. 

Even though objective-based search is effective in many 
cases, there are two problems with it that are especially rel¬ 
evant in artificial life. First, it is not always clear how the 
objective function should be defined. The desired behavior 
may consist of many aspects that interact (such as speed, en¬ 
ergy consumption, effectiveness, quality of the result), and 
some of them may be difficult to express formally (such as 
believability, creativity, elegance). Second, the domains are 
often deceptive, i.e. optimization requires creating individu¬ 
als that do not perform well, but can serve as stepping stones 
in constructing those that do. This effect can be seen in many 
cognitive tasks that require memory, learning, or communi¬ 
cation (Lehman and Miikkulainen, 2014), but it is also clear 
in the process of creating interesting images (Secretan et al., 
2011 ). 

Divergent search methods have recently emerged, mostly 
in the field of artificial life, as a potential solution to these 
issues. The idea is not to incrementally approach an op¬ 
timum of a specified fitness function, but instead create as 
much diversity in the search as possible. This idea has 
been expressed in several forms, including empowerment 
(Salge et al., 2013), entropy maximization (Wissner-Gross 
and Freer, 2013), and behavioral diversity (Mouret and Don- 
cieux, 2012). The particular formulation used in this paper 
is novelty search (Stanley and Lehman, 2015; Lehman and 
Stanley, 2010a), where individuals are rewarded based on 
how different they are from other individuals encountered 
so far. The novelty p for an individual x is defined as 

1 -V 

p(x) = ^ ^2 dist(x, Hi), ( 1 ) 

i =0 

where pi is the it h nearest neighbor of x according to the dis¬ 
tance metric dist. Note that distance is measured in the phe¬ 
notypic, i.e. behavioral, space, not in the genotypic space. 
The motivation comes from biology: behavioral niches that 
are novel will survive, regardless of what their genetic cod¬ 
ing is. 


Novelty search can be surprisingly effective in solving 
problems, especially those that are deceptive. For instance, 
evolving a robot to run through a simulated maze, using dis¬ 
tance to the goal as the fitness, is easy if the maze is rela¬ 
tively simple. However, dead ends close to the goal make 
it deceptive, and objective-based search often gets stuck. In 
contrast, novelty search will create individuals that explore 
all the different parts of the maze, and eventually will find a 
way to the goal even though such solutions require traveling 
away from it occasionally (Lehman and Stanley, 2010a). 

Novelty search also provides an interesting abstraction of 
biological evolution. There is no specific goal in biologi¬ 
cal evolution; instead individuals and species survive if they 
find a niche that they can exploit. Life therefore rapidly 
spreads through the available niches, leading to species that 
are highly adapted to their environment, and to tremendous 
diversity overall. Extinction events can serve to accelerate 
this process by selecting for highly evolvable individuals and 
species (Lehman and Miikkulainen, 2015). 

One aspect of biological novelty search that is not cap¬ 
tured by current methods is that such search takes place si¬ 
multaneously and in parallel across the space of solutions. 
Individuals and species do not necessarily compete with ev¬ 
eryone in that space, but with those that are local to them. In 
other words, biological novelty search is distributed. It may 
result in discovering similar individuals multiple times, but 
it may also result in discovering more diverse individuals. 
This distribution is the first design principle of DANS in this 
paper. 

Combining Novelty and Fitness 

The second design principle of DANS is incorporating fit¬ 
ness as a component into divergent search. Even though 
biology may not have a goal, engineering problem solving 
does. At the very least there needs to be a mechanism for 
detecting viable solutions produced by the divergent search, 
but there may be a benefit in guiding it as well. Diversity 
and novelty is necessary for discovering the stepping stones, 
but since we know what we ultimately want to achieve, it 
may be possible to guide the search towards promising ar¬ 
eas, without diluting its power. 

Several approaches for combining fitness and novelty 
have been proposed, and shown to be effective in solving 
practical problems (Gomes et al., 2015). Many of them com¬ 
bine a fitness objective with a novelty objective in some way, 
for instance as a weighted sum (Cuccu and Gomez, 2011), 
or as different objectives in a multi-objective search (Mouret 
and Doncieux, 2012). Another approach is to keep the two 
kinds of search separate, and make them interact through 
time. For instance, it is possible to first create a diverse pool 
of solutions using novelty search, presumably overcoming 
deception that way, and then find solutions through fitness- 
based search (Krcah and Toropila, 2010). A third approach 
is to run fitness-based search with a large number of objec- 
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tive functions that span the space of solutions, and use nov¬ 
elty search to encourage search to utilize all those functions 
(Cully et al., 2015; Mouret and Clune, 2015; Pugh et al., 
2015). A fourth category of approaches is to run novelty 
search as the primary mechanism, and use fitness to select 
among the solutions. For instance, it is possible to add local 
competition through fitness to novelty search (Lehman and 
Stanley, 2011). Another version is to accept novel solutions 
only if they satisfy minimal performance criteria (Lehman 
and Stanley, 2010b; Gomes et al., 2013). 

This paper advances the techniques for combining novelty 
and fitness in this fourth category, in two ways. First, novelty 
search is run in each of the parallel and distributed clients, 
but periodically their most novel solutions are harvested at 
the system level for those that are the most fit—they are 
then injected into the populations of the parallel searches to 
bias them towards high fitness. This approach is an exten¬ 
sion that utilizes the distributed nature of DANS. Second, 
in selecting an individual to keep from the least novel pair, 
each client search prefers the fitter choice, creating a fitness 
bias within each search. The results show that the result¬ 
ing fitness-biased novelty search is more powerful than the 
standard version. 

Distributed Evolution Through EC-Star and Age 
Layering 

Age-layered fitness calculation is an approach suitable for 
data problems in which evolved solutions need to be applied 
to many fitness samples in order to measure a candidates fit¬ 
ness confidently (Hodjat and Shahrzad, 2013). Age layering 
is an elitist approach: best candidates of each generation are 
retained to be run on more fitness cases to improve confi¬ 
dence in the candidates fitness. The number of fitness eval¬ 
uations (i.e. samples shown) in this method depends on the 
relative fitness of a candidate solution compared to others at 
the current state of the search. 

Note that this age-layering technique is distinctly differ¬ 
ent from similarly named Age-Layered Population Structure 
(ALPS) method (Hornby, 2006). ALPS partitions popula¬ 
tions into layers according to generations, with the main goal 
of maintaining diversity. Age layering in this paper is more 
closely related to the Early Stopping method in evolutionary 
robotics, where a complex evaluation is terminated if it is 
guaranteed not to produce offspring even if evaluated fully 
(Nolfi and Floreano, 2000; Bongard and Hornby, 2010; Bon- 
gard, 2011). 

EC-Star (O’Reilly et al., 2013) is a massively distributed 
evolutionary platform that uses age-varying fitness as the ba¬ 
sis for distribution, and thus makes it possible to distribute 
large data problems through sampling, hashing, and fea¬ 
ture reduction techniques. The available data is divided into 
smaller chunks, each contributing to the overall evaluation 
of the candidates. 

In EC-Star, age is defined as the number of fitness samples 


upon which a candidate has been evaluated. EC-Star uses a 
hub-and-spoke architecture for distribution, where the main 
evolutionary process is moved to the processing clients (Fig¬ 
ure 1). Each client, or Evolution Engine, has its own pool 
and independently runs through the evolutionary cycle. At 
each new generation, an Evolution Engine submits its fittest 
candidates to the server, or Evolution Coordinator, for con¬ 
sideration. The submission takes place typically after each 
candidate has been evaluated on fixed number of samples, 
called the maturity age. 

The Evolution Coordinator maintains a list of the best 
candidates so far. EC-Star achieves scale through making 
copies of genes at the server, sending them to Evolution 
Engines for aging, and merging the aged results reported 
back by the Evolution Engines. This process also allows 
the spreading of the fitter genetic material. EC-Star is mas¬ 
sively distributable by running each Evolution Engine on a 
processing node (e.g. CPU) possibly with limited bandwidth 
and occasional availability (Hodjat et al., 2014). Typical 
runs utilize hundreds of thousands of processing units span¬ 
ning across thousands of geographically dispersed sites. 

In the Evolution Coordinator, only candidates of the same 
age-range are compared with one another (i.e. they are age¬ 
layered). Each age-range has a fixed quota, and a “shadow” 
of a candidate that has aged out of an age-layer is retained 
as a placeholder for filtering incoming candidates. In this 
manner, unreliable estimates do not dominate the evaluation 
process. 

EC-Star with Age Layering is well suited for implement¬ 
ing DANS, as will be described next. 

Distributed Age-Layered Novelty Search 

In the DANS approach, the Evolution Engines (clients) are 
configured to use novelty search for their parent selection, 
while the Evolution Coordinator (servers) continues to make 
use of fitness to decide which individuals to allow in the 
server pool. Evolution Engines still receive individuals from 
Evolution Coordinators for aging, and these individuals are 
added to the local elitist pool, participating as parents in cre¬ 
ating the next generation. Each Evolution Engine, however, 
solely operates on the basis of novelty rather than fitness, 
selecting the most novel individuals in the local pool as par¬ 
ents. The algorithm running in each Evolution Engine is 
summarized in Table 1. 

For each sample in the data set, a hash representation of 
each individuals’ behavior is logged. Each individual re¬ 
ceives a sample from the data set as its input and generates 
an output that defines the action to be executed. For ex¬ 
ample, in the 11-multiplexer problem, where there are eight 
actions, each referring to one of the data bits in the multi¬ 
plexer, the behavior logged is the address of the data bit that 
the individual outputs for the given input. 

After maturity age, individuals judged to be the most 
novel are selected for the elitist pool. Instead of a global 
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Figure 1: Implementation of Distributed Age-Layered Nov¬ 
elty Search (DANS) on the EC-Star System. DANS consists 
of a number of Evolution Engines running novelty search 
and a single Evolution Coordinator in a hub-and-spoke ar¬ 
rangement. Each candidate in each Evolution Engine is eval¬ 
uated with a fixed number of samples (called the maturity 
age); each Engine then sends their most novel individuals 
to the Evolution Coordinator. The Coordinator maintains a 
list of these highly novel individuals ordered by fitness, and 
sends the most fit ones back to other Evolution Engines for 
further evaluation (i.e. aging) and evolution. In this man¬ 
ner, multiple novelty searches execute in parallel, exploring 
the space with more diversity, benefiting from each other’s 
discoveries, biased towards areas with higher fitness. 


archive, novelty is measured in the current population so 
that every individual’s behavior log is based on the same 
set of examples. Using this log as Cartesian coordinates, 
Euclidean distances between individuals are calculated, and 
one of the individuals in the pair of individuals nearest one 
another is eliminated. This process is repeated until the 
quota for parents in the elitist pool is met. The resulting 
set of parents is then used to create the next generation. 

Two different versions are implemented in forming the 
elitist pool. The first one is based purely on novelty: in the 
pair of most similar individuals, the one that’s nearer to at 
least one of the other individuals in the pool is removed. 
The second one is based partly on fitness: in that pair, the 
one with the lower fitness is removed, thus subtly biasing 
the system to favor genes with better fitness. These two ver¬ 
sions, called Pure Novelty and Hybrid, will be compared in 
the experiments that follow. 


1. Receive a batch of individuals from the Evolution Co¬ 
ordinator. 

2. If pool has capacity, fill it with randomly generated 
individuals. 

3. Test each individual in the pool on a maturity-age 
number of fitness samples (each individual is run on 
the same sample as the others), and construct the rep¬ 
resentation of its behavior. 

4. The individuals received from Coordinator have now 
been evaluated with more samples than before: report 
the results back to the Coordinator. 

5. Calculate the minimum of pair-wise distances of all 
individuals in the pool and discard one of that pair 
based on distance from all other genes (the pure- 
novelty version), or fitness (the hybrid novelty/fitness 
version). Do this until only the elitist percentage of 
individuals remain. 

6. Report the most novel individuals (i.e. the elitists from 
prior step) to the Coordinator. 

7. Refill the pool by applying crossover and mutation 
operators on the elitist genes from Step 5. 

8. Go to 1. 


Table 1: The Evolution Engine Algorithm, i.e. the sequence 
of steps in advancing evolution for one generation. 


Experiments 

DANS was tested on two practical design optimization prob¬ 
lems: the 11-Multiplexer and the eight-input sorting net¬ 
work. Each problem is described first, with its experimental 
setup, and then results. 

The 11-Multiplexer Domain 

Multiplexer functions have long been used to evaluate ma¬ 
chine learning methods because they are difficult to learn 
but easy to check. In general, the input to the multi¬ 
plexer function consists of u address bits A v and 2 U data 
bits D v , i.e. it is a string of length u + 2 U of the form 
A u _i...AiAqD 2 u-i...DiDq. The value of the multiplexer 
function is the value (0 or 1) of the particular data bit that is 
singled out by the u address bits. For example, for the 11- 
Multiplexer, where u = 3, if the three address bits A 2 A 1 A 0 
are 110, then the multiplexer singles out data bit number 6 
(i.e. D 6 ) to be its output (Figure 2). 

A Boolean function with u + 2 U arguments has 2 w+2W 
rows in its truth table. Thus, the sample space for the 
Boolean multiplexer is of size 2 W+2 . When u = 3, the 
search space is of size 2 2 = 2 2048 « 10 616 . However, 

since evolution can also generate redundant expressions that 
are all logically equal, the real size of the search space can 
be much larger, depending on the representation. 

Following prior work on the 11-Multiplexer problem 
(Shahrzad and Hodjat, 2015), a rule-based representation 
was used where each candidate specifies a set of rules of 
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Figure 2: The 11-Multiplexer. The three address bits (top) 
specify one of the eight data bits (bottom) whose value will 
then be output. Multiplexers are a good test domain for de¬ 
sign optimization because the search space is very large, but 
it is easy to check whether a design is valid. 


the type 

< rule > ::= < conditions > < action > . 

The conditions specify values on the bit string and the action 
identifies the index of the bit whose value is then output. For 
instance, the following rule outputs the value of data bit 6 
when the first three bits are 110: 


< A 0 = 0 & A x = 1 & \A 2 = 0 > D 6 . 


These rules are evolved through the usual genetic operators 
in genetic programming (Berlanga et al., 2010). 

In the 11-multiplexer experiments, each evolution engine 
has a pool size of 4000, an elitist percentage of 5%, and a 
maturity age of 128. That is, in each generation, each of the 
4000 candidates is evaluated once with 128 randomly cho¬ 
sen multiplexer input samples, for a total of 512,000 evalu¬ 
ations per generation. At the top age-layer each candidate 
has thus seen 2048 samples. Crossover combines subsets of 
rules of each parent; mutations modify components of each 
rule. Fitness is defined as the number of samples an individ¬ 
ual processes correctly, outputting the value of the data bit 
specified by the address bits in the 11-bit input sample. The 
novelty measure is the data bit address outputted by the in¬ 
dividual for each sample. Each evolution is run until a valid 
multiplexer is found, i.e. one that outputs the correct bit for 
every possible combination of address bits. 

Experiments were run comparing non-distributed runs to 
distributed runs with eight evolution engines per run. Two 
versions of distributed runs were compared: those with 
pure-novelty elitism, and those with hybrid novelty/fitness 
elitism. The non-distributed runs were implemented as hy¬ 
brid runs on a single evolution engine, using the same pa¬ 
rameters as the distributed version. Each experiments was 
repeated ten times, and the results averaged. 



Figure 3: Performance of Hybrid and Pure-Novelty DANS 
vs. Non-Distributed Novelty Search on the 11-Multiplexer 
Problem. The plot shows the average and standard deviation 
of number of generations to find a valid solution. DANS sig¬ 
nificantly outperforms non-distributed novelty search, and 
the hybrid version of DANS the pure novelty version. The 
speedup is approximately linear in Evolution Engines, sug¬ 
gesting that DANS is an effective way to parallelize novelty 
search. 

11-Multiplexer Results 

The results are summarized in the bar graph shown in Fig¬ 
ure 3. The main conclusion is that DANS significantly out¬ 
performs the non-distributed runs; within DANS, the hybrid 
elitism outperforms pure novelty. The hybrid version found 
a valid solution in 27.5 generations on average, pure novelty 
in 89.1 generations, and non-distributed evolution in 242.2 
generations. Thus, distribution across the eight Evolution 
Engines makes evolution more reliable and speeds it up sig¬ 
nificantly, i.e. approximately linearly in the number of Evo¬ 
lution Engines. 

The Sorting Network Domain 

The second experimental domain is minimization of eight- 
input sorting networks. A sorting network of n inputs is a 
fixed layout of comparison-exchange operations (compara¬ 
tors) that sorts all inputs of size n (Figure 4; Knuth 1998). 
Since the same layout can sort any input, it represents an 
oblivious or data-independent sorting algorithm, that is, the 
layout of comparisons does not depend on the input data. 
The resulting fixed communication pattern makes sorting 
networks desirable in parallel implementations of sorting, 
such as those in graphics processing units, multi-processor 
computers, and switching networks (Kipfer et al., 2004; 
Baddar, 2009; Valsalam and Miikkulainen, 2013). 

Beyond validity, the main goal in designing sorting net¬ 
works is to minimize the number of layers, because it deter¬ 
mines how many steps are required in a parallel implemen¬ 
tation. A tertiary goal is to minimize the total number of 
comparators in the networks. Designing such minimal sort- 
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Figure 4: A Four-Input Sorting Network. This network 
takes as its input (left) four numbers, and produces output 
(right) where those number are sorted (small to large top to 
bottom). Each comparator (connection between the lines) 
swaps the numbers on its two lines if they are not in order, 
otherwise it does nothing. This network has four layers and 
five comparators, and is the minimal four-input sorting net¬ 
work. Minimal networks are generally not known for input 
sizes larger than eight, and designing them is a challenging 
optimization problem. 

ing networks is a challenging optimization problem that has 
been the subject of active research since the 1950s (Knuth, 
1998). Although the space of possible networks is infinite, it 
is relatively easy to test whether a particular network is cor¬ 
rect: If it sorts all combinations of zeros and ones correctly, 
it will sort all inputs correctly (Knuth, 1998). Sorting net¬ 
works are therefore a good domain to test the power of evo¬ 
lutionary algorithms; indeed many of the recent advances 
in sorting network design are due to evolutionary methods 
(Valsalam and Miikkulainen, 2013). The eight-input case is 
a good test case because the optimal network is known: it 
has six layers and 19 comparators (Knuth, 1998). 

The sorting network representation for DANS is built on 
the rule-set representation of the 11-multiplexer. Each rule 
represents a layer of comparators; each condition within 
each rule identifies the input lines of the comparator; the 
action is not used. In this manner, it is possible to evolve 
sorting networks using the same methodology as for evolv¬ 
ing rule sets. As a matter of fact, all Evolution Engine set¬ 
tings are the same as for the 11-Multiplexer experiments. In 
particular, the maturity age is 128 samples and the individu¬ 
als in the top age layer have been tested with 2048 random 
samples. 

The fitness of the network is primarily based on its ability 
to sort correctly, secondarily by the number of layers, and 
tertiarily by the number of comparators: 

F = aS — (2 n L + C), (2) 

where a is a proportionality constant (10000 * 2 16 in these 
experiments), S is the number of samples the network sorts 
correctly, n is the number of lines (8 in these experiments), 
L is the number of layers, and C is the number of compara¬ 
tors in the network. Because all three of these goals need 
to be optimized simultaneously, sorting networks represent 


a more challenging and open-ended, as well as more decep¬ 
tive, domain than the 11-multiplexer. 

In order to measure the novelty in sorting behavior, note 
that there is no action to rely on, but instead the behavior 
needs to be constructed from the structure of the network 
itself. To this end, each input line is represented with a suc¬ 
cessive prime number, i.e. 1, 2, 3, 5, 7, 11, 13, and 17. The 
sorting network is run on the sample input, and for each pair 
of lines that it exchanges, the corresponding prime numbers 
are multiplied. The product of these values constitutes a 
hash for the phenotypical behavior on that sample. For ex¬ 
ample, if the sorting network has the structure 

Layer 1: sort(line0, line3) and sort(linel, line2) 

Layer 2: sort(line0, line4), 

and the sample is 11010100 (with lines ordered 7..0), the 
network will rearrange it to 11000011. The phenotypical 
hash is then 

(2 * 3) * (1 * 7) = 42. 

A vector of these hash values for a number of samples (i.e. 
the maturity age) represents the behavior of the network, and 
the Euclidean distance between these vectors is used to mea¬ 
sure novelty. 

The sorting network experiments were all run until 1000 
generations (which takes about two hours of total CPU time 
on an Intel i7 2.60GHz machine). The two versions of 
DANS and the non-distributed version were then compared 
in three dimensions (1) how fast they found a valid sorting 
network, (2) how many layers and (3) how many compara¬ 
tors did the best network found have. The results were aver¬ 
aged over ten runs. 

Sorting Network Results 

The DANS approach found valid sorting networks in 14.4 
(hybrid) and 25 (pure) generations on average, compared to 
the non-distributed approach which took 98.1 generations 
on average (Figure 5. Similarly, DANS found solutions with 
significantly fewer layers than the non-distributed version 
(32.1), with the hybrid version significantly fewer than the 
pure novelty version (7.4 vs. 15.7; Figure 6). It was also 
most economical in the number of comparators: Whereas 
the non-distributed version used 56.2 comparators on aver¬ 
age, the pure novelty version used 32.2 and the hybrid ver¬ 
sion only 21.7 (Figure 7). 

Interestingly, two of the ten hybrid runs actually found 
optimal sorting networks, with six layers and 19 compara¬ 
tors, within the 1000 generations. These results suggest that 
the hybrid version of DANS could be used to discover new 
minimal networks, given sufficient computing effort. 

Discussion and Future Work 

The DANS approach can be seen as a highly robust artificial 
life system in which islands of evolution are searching for 
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Figure 5: Number of Generations Hybrid and Pure-Novelty 
DANS and Non-Distributed Novelty Search Need to Dis¬ 
cover a Valid Eight-Input Sorting Network. DANS signif¬ 
icantly outperforms the non-distributed version, and hybrid 
version of DANS the pure novelty version. 

behavioral niches to fill in the search space. Occasionally 
their most novel solutions migrate to a coordinator that aims 
to solve a particular problem, and therefore injects guidance 
into the islands in terms of the most fit of those novel indi¬ 
viduals. 

As a practical method for problem solving, DANS finds 
better solutions significantly faster than a similar non- 
distributed search: In the test problems in this paper, the 
speedup is approximately linear in the number of Evolution 
Engines. This result is remarkable because the search prob¬ 
lem cannot be simply divided into subproblems that could 
be solved independently in parallel. Instead, the result is 
likely due to the larger total population of individuals and 
the increased diversity across multiple islands. The distribu¬ 
tion makes it possible to combine fitness with novelty search 
effectively, by incorporating it into the migration between 
the islands, as well as in the selection of elitist individuals. 
DANS thus makes it possible to apply novelty search effec¬ 
tively to practical design problems. 

Computationally, the system is scalable, and the coordi¬ 
nators can be federated (Hodjat et al., 2014). The system 
is also robust because it can tolerate temporarily losing its 
ability to coordinate (e.g, due to communication problems, 
or server outages, etc.), and it can even reconstruct its list 
of candidate solutions, should the data be lost at the co¬ 
ordinator. The system can also tolerate the loss of evolu¬ 
tion engines. The approach can therefore be used to tackle 
big data problems that require massive amounts of comput¬ 
ing to solve, such as minimizing large sorting networks or 
VLSI design in general, optimizing large-scale logistics and 
scheduling problems, protein folding and other biomedical 
optimization problems, and in general problems where each 
processing node can only have access to a subset of the data 
through sampling. 
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Figure 6: Number of Layers Discovered by Hybrid and 
Pure-Novelty DANS vs. Non-Distributed Novelty Search in 
1000 Generations on the Eight-Input Sorting Network Prob¬ 
lem. After validity, minimizing the number of layers is the 
main design goal; DANS significantly outperforms the non- 
distributed version, and hybrid version of DANS the pure 
novelty version. 

DANS can also be useful in dynamic problems where the 
fundamental attributes of the problem can change through 
time: the evolution engines operate based on the behavior 
of the solutions, striving to achieve maximal coverage of 
the search space, versus concentrating on subspaces defined 
by the peculiarities of the fitness landscape. DANS is thus 
a mechanism that converts a powerful principle in artificial 
life into a practical tool for solving challenging engineering 
design problems. 

Conclusion 

This paper presents DANS, a parallel distributed design for 
the novelty search algorithm, and a principled way of com¬ 
bining fitness with novelty. These extensions result in a 
system that can discover better solutions much faster than 
standard novelty search. It thereby shows how fundamental 
ideas in artificial life can be useful in problem solving in the 
real world. DANS should be most useful in finding optimal 
solutions to big-data problems such as those in engineering 
design. 
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Figure 7: Number of Comparators Discovered by Hy¬ 
brid and Pure-Novelty DANS vs. Non-Distributed Novelty 
Search in 1000 Generations on the Eight-Input Sorting Net¬ 
work Problem. Number of comparators is the third, and least 
important, component of fitness; again DANS significantly 
outperforms the non-distributed version, and hybrid version 
of DANS the pure novelty version. 
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Abstract 

Inspired by the self-organization of growing embryos and co¬ 
ordinated movement of multicellular assemblies such as the 
slime mold Dictyostelium , where each cell is controlled by 
the same controller (a DNA-encoded gene regulatory net¬ 
work), we evolve distributed gait control mechanisms for 
soft-bodied animats. The animats are made of compressible 
material, with each body region capable of independent ac¬ 
tuation, controlled by a cell at its center. Each animat con¬ 
sists of hundreds of cells uniformly distributed throughout the 
body, each sharing the same artificial gene regulatory network 
and aware of the state of their local neighborhood. We found 
that one of the most common actuation patterns that emerged 
relied on cells synchronizing their oscillations in order to pro¬ 
duce a rotating, spiral wave spanning throughout the body. 
We found this type of mechanism to emerge for a wide range 
of animat morphologies as well as in very different types of 
initial conditions. We investigate how the evolved controllers 
produce the pattern through local feedbacks and evaluate spi¬ 
ral stability when imperfect, noisy cells are used. 

Introduction 

Taking inspiration from distributed control mechanisms ob¬ 
served in nature, such as self-organization of a growing mul¬ 
ticellular embryos and movement of multicellular assem¬ 
blies of certain amoeba known as slime molds (e.g., Dic¬ 
tyostelium ), we investigated the possibility of evolving dis¬ 
tributed controllers for prespecified morphologies of soft- 
bodied robots that would produce gaits in a truly decen¬ 
tralized manner. By dividing animat bodies into hundreds 
of cells capable of communicating with their neighbors, we 
were expecting to observe the evolution of some form of au¬ 
towaves organizing the gaits. Autowaves are a special type 
of nonlinear waves that are known to occur in active me¬ 
dia and the main difference between autowaves and classical 
waves is that propagation of the former occurs at the expense 
of energy stored in the medium. The energy is used to trig¬ 
ger process into adjacent regions (Roska et al., 1995; Man- 
ganaro et al., 1999). Autowaves occur in many biological 
phenomena, in particular they are essential to multicellular 
development, but are also central to processes such as prop¬ 
agation in nerve fibers or heart excitation. 

The unexpected result of our evolutionary experiments 
was the predominant type of control mechanism that 
emerged. It was based on producing a very specific type 
of auto wave: a rotating spiral known as a spiral auto wave. 
Spiral autowaves are frequently observed in excitable media 



(hand-drawn, clipart) 

Evolution (NEAT) 

Figure 1: Overview of the evolutionary approach used to the 
evolve distributed controllers for simulated soft-animats. 

(Ma et al., 2010), and have been observed to emerge in me¬ 
dia as different as chemical solution of Belousov-Zabotinski 
reaction, cardiac tissue or neurons of neocortex, though they 
usually emerge in a chaotic, unpredictable form. In this 
work, however, we were able to observe how evolution cre¬ 
ates controllers that self-organize into spiral autowaves an¬ 
chored at a specific location of animat’s body and produce 
cyclic, sustainable gaits. 

Methods 

We have employed the same approach to simulate soft- 
bodied animat locomotion as in our previous studies (see 
full description in Joachimczak et al., 2015), that is animats 
are two dimensional and are represented as a set of point 
masses (corresponding to cells) connected with springs. Un¬ 
like our earlier work, where we investigated co-evolution of 
bodies and brains, here we focused solely on the design of 
distributed controllers only. Hence, we assumed that mor¬ 
phology of an animat is specified at the beginning of an 
evolutionary run and does not change (other than the elastic 
changes during locomotion). Animat shapes were specified 
either as a drawing or a clip-art and then algorithmically tri¬ 
angulated to produce a mesh with a desired number of nodes. 
Locomotion was possible owing to the local, elastic changes 
to the body controlled by each cell. 

During the evaluation, each cell of an animat is controlled 
by a copy of the same evolved artificial gene regulatory net¬ 
work (GRN) encoded in the genome (Fig. 1), with gene 
expression levels changing in a continuous manner. De¬ 
spite the same controller, cells can differ in their behav¬ 
iors, due to differences in environmental (input) signals ul¬ 
timately producing different internal states of cells. The 
GRN topology was evolved using the NEAT (Stanley and 
Miikkulainen, 2002) algorithm, a state of the art technique 
for evolving network topologies. Fitness function promoted 
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Figure 2: Examples of spiral autowave driven actuation 
evolved for four different morphologies. Color shows cur¬ 
rent local actuation signal (red - expansion, blue - contrac¬ 
tion). Circular arrow indicates the direction of spiral rota¬ 
tion. Videos of each animat’s gait available at: https://goo.gl/ 
jlOUnZ (additional links in the description of the video) 

distance achieved by simulated animat. 

We evolved GRNs with only a single output determining 
the current level of contraction or expansion of a body region 
around a given cell. The main input represented averaged 
output state of the cell’s neighbors. To ensure that oscilla¬ 
tory activity cannot start and sustain itself without receiv¬ 
ing signals from neighbors, we did not use any bias inputs. 
We then experimented with different methods of seeding the 
initial activity by stimulating a few cells or providing mater¬ 
nal gradients. We would remove the seeding signals after a 
short period of time, so that the activity within the body had 
to sustain itself through the propagating autowaves. Finally, 
to identify the evolved mechanisms of emerging autowave 
patterns, we compared experiments in which recurrent con¬ 
nections in GRNs are allowed or disabled. 

Results 

We found that evolution, tasked with a problem of evolving 
distributed, local communication-driven controllers for soft 
animats repeatedly converged on a very simple and creative 
solution that relies on producing a rotating spiral autowave 
anchored in the center of the body or even multiple synchro¬ 
nized spirals in case of elongated individuals (Fig. 2). We 
also found that this simple control mechanism evolves for a 
wide range of tested animat morphologies and emerges both 
if a highly localized (two cells) or a global (gradient) seed 
stimuli are used to initialize the waves of cellular activity. 
This suggests that the spiral patterns stem from initial het¬ 
erogeneity in cellular activity, though a single cell stimulus 
was not sufficient in our experiments. In each case, what 
starts as seemingly chaotic waves propagating through the 
body, in a few hundreds of GRN updates forms a rotating 
spiral that sustains itself, often indefinitely. As the rotating 
arm of a spiral sweeps through the bottom of animat’s body, 
the bottommost cells are raised, producing a gait that works 
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Figure 3: A rotating spiral autowave emerges from cellular 
activity seeded in two cells at the center of the body. Video 
available at: https://goo.gl/Oklqno 

both for morphologies that have a flat bottom as well as for 
morphologies supported by appendages. 

Finally, we compared scenario in which cells can rely on 
a feedback of their own state with that of cells relying on 
the state of their neighbors and found that while the spiral 
autowaves emerged in each of the cases, different types of 
designs had very different robustness to noise. In particular, 
if we assumed that cells’ internal clocks are imperfect, only 
the experiments in which cells communicate with neighbors 
were able to produce sustainable spirals. 

Conclusions 

While the spiral autowaves are a common phenomenon in 
many physical and biological systems, we see their unex¬ 
pected emergence in the context of evolving distributed gait 
controllers for soft-animats as an example of how artificial 
evolution can surprise us and suggest entirely new type of 
design, one that would be otherwise unlikely to be proposed 
by a human designer. Further study will reveal how robust is 
this type of design and how well it can apply to actual, 3-D 
soft-robots. 

References 

Joachimczak, M., Suzuki, R., and Arita, T. (2015). From 
tadpole to frog: artificial metamorphosis as a method of 
evolving self-reconfiguring robots. In Proc. of the 13th 
European Conference on the Synthesis and Simulation 
of Living Systems (ECAL 2015), pages 51-58. The MIT 
Press. 

Ma, J., Wang, C.-N., Jin, W.-Y., and Wu, Y. (2010). Transi¬ 
tion from spiral wave to target wave and other coherent 
structures in the networks of Hodgkin-Huxley neurons. 
Applied Mathematics and Computation , 217(8):3844- 
3852. 

Manganaro, G., Arena, P., and Fortuna, L. (1999). Cellu¬ 
lar Neural Networks Chaos , Complexity and VLSI Pro¬ 
cessing. Springer Berlin Heidelberg. 

Roska, T., Chua, L. O., Wolf, D., Kozek, T., Tetzlaff, R., and 
Puffer, F. (1995). Simulating nonlinear waves and par¬ 
tial differential equations via CNN. I. Basic techniques. 
IEEE Transactions on Circuits and Systems I: Funda¬ 
mental Theory and Applications, 42(10): 807-815. 
Stanley, K. O. and Miikkulainen, R. (2002). Evolving neural 
networks through augmenting topologies. Evol. Corn- 
put., 10(2):99-127. 


141 














Self-organized control of an tendon driven arm by differential extrinsic plasticity 


Georg Martius 1 , Rafael Hostettler 2 , Alois Knoll 2 , and RalfDer 3 

1 IST Austria, Am Campus 1, 3400 Klosterneuburg, Austria 
2 Institut fur Informatik VI, TU Miinchen, Boltzmannstr. 3, 85748 Garching bei Miinchen, Germany 
3 Max Planck Institute for Mathematics in the Science, Inselstr. 22, 04103 Leipzig, Germany 

gmartius @ ist. ac. at 


The self-determined cognitive development of high- 
complexity autonomous robots is a challenging task for both 
the creation of robot-human ecosystems and the creation of 
artificial life systems with real, human-like robots. Anthro- 
pomimetic robots are a prominent example of this challenge. 
Different from classical robots, anthropomimetic robots are 
built following the morphology of the human body. Such 
robots are more soft than classical systems making them 
safer to interact with and thus favorable for service robots 
in human environments. Moreover, because of their human 
like morphology, they can be used for better understanding 
human behavior generation and development. 

World wide, several of such muscle-tendon driven (MTD) 
systems have already been built. While mechatronically at 
an advanced level, the control of both MTD and soft robotic 
systems in general is still in its infancy. A generic example 
is given by pertinent EU projects ranging from CRONOS, to 
ECCEROBOT to MYOROBOTICS. While excellent work 
has been done in building these robots their control faces 
many problems. Learning of control policies becomes es¬ 
sential and is investigated mainly in the reinforcement learn¬ 
ing setting. 

Without a very compact parametrization learning a new 
behavior takes a very long time in high-dimensional sys¬ 
tems. This situation clearly calls for new controller 
paradigms which optimally exploit the physical properties of 
such soft systems as indicated by embodied AI. This paper 
presents an approach that includes the world—i. e. body plus 
environment—more actively and more systematically in the 
control process than other embodied control approaches. 
By inverting the roles of the controller and the controlled, 
the world becomes not only “its own best model” (Rod¬ 
ney Brook’s idea) but is leveraged to “its own best con¬ 
troller” (Der and Martius, 2016). This idea can be imple¬ 
mented by a neural network with a novel synaptic plasticity 
rule (Der and Martius, 2015, 2016), as shown in Fig. 1. 

The novelty of the controller can be demonstrated best by 
applying it to MTD systems, for instance the Myo-robotics 
arm, reported here, with its ball and socket shoulder joint 
and 9 muscles in total. Different from classical robots with 



Figure 1: Neural controller network connected to the 
Myo-robotic arm. The inset on the right illustrates the 
synaptic plasticity, called differential extrinsic plasticity, 
which is driven by a modified differential Hebbian law, mul¬ 
tiplying the time derivatives of the incoming sensor values x 
with the virtual motor values y, which are generated by the 
inverse model (M, one-to-one mapping in the case of the 
arm) from next input’s derivative x !. 


revolute joints, the motor positions do not directly trans¬ 
late into joint angles and into poses. Due to the elastic¬ 
ity of the muscles, there are infinitely many combinations 
of motor positions for a single arm pose. Apart from that, 
the most challenging task is to avoid the dislocation of the 
shoulder which cannot happen with re volute joints. Aston¬ 
ishingly, although structurally extremely simple, the new 
control paradigm does not have problems with these par¬ 
ticularities. For instance the tendons are kept tight auto¬ 
matically, such that no dislocation appears. When embed¬ 
ding our controller, see Fig. 1, into the sensorimotor loop, a 
meta-system—consisting of the mechanical system, the con¬ 
troller with its sensor driven synaptic dynamics, and the en¬ 
ergy supply (battery)—is created displaying a rich behav¬ 
ioral spectrum like limit cycle attractors, long lived tran¬ 
sients, and fixed point flows generating pseudo-random se¬ 
quences of poses. The concrete behavior is not given ex¬ 
plicitly, but specific behaviors develop by themselves in a 
dynamical interplay between controller dynamics and world 
dynamics. This open physical system is like a reservoir of 
meta-stable behavior patterns waiting to be excited. Ex- 
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citation can be achieved either by manual interaction (see 
below) or by the self-amplification of latent modes spon¬ 
taneously arising in physical subsystems, see Video 1 (see 
playf ulmachines . com/MyoArm-2). By way of example, 
consider Video 3 where a weight (a bottle) was suspended 
from the tip of the arm with a string, forming as a physi¬ 
cal subsystem. In the beginning, minimal motor activities 
are seen to spontaneously excite minor pendulum motions. 
These oscillations directly exert physical forces on the arm 
which propagate via the springs into the sensor values and 
eventually into the synaptic dynamics which governs the be¬ 
havior. This may lead to the amplification of latent pendu¬ 
lum modes until a stable circular movement of the pendulum 
is achieved. These findings elucidate how a physical subsys¬ 
tem (the pendulum) may pilot—by its internal dynamics— 
the meta-system into a resonant state, i. e. a whole-system 
mode with defined frequency. 

Actually, this is the essence of the method which explains 
the emergence of specific modes—specific for the actual 
physical setting—of the system. For instance, when attach¬ 
ing a bottle half-filled with water to the tip of the arm in ei¬ 
ther horizontal or vertical orientation, stable shaking modes 
are arising, as demonstrated by both Video 6 and Video 7 
showing modes specific for each physical setting (horizon¬ 
tal or vertical bottle). Again, we see how the meta-system 
may become resonant with the internal dynamics of a sub¬ 
system, if the latter provides perceivable correlations over 
space and time. This is the case for instance when the wa¬ 
ter is hitting either the walls or top and bottom of the bottle. 
These impacts cause a reaction of the springs and hence of 
the sensor values, which may increase correlations in the 
synaptic dynamics resulting in enhances motions of the arm 
in coherence with these signals. 

By this compliance mechanism, the “brain” may also dis¬ 
cover (dynamical) affordances of the physical world it is in¬ 
teracting with. In further experiments, the robot is connected 
to a revolvable bar or wheel with weights for giving it the 
some moment of inertia. In Video 8 the robotic arm finds a 
behavior rotating the wheel from an initial push by the user. 
When positioning the wheel in parallel to the arm, the modes 
were emerging even more readily as seen by Video 9. More¬ 
over, the system can immediately be switched between the 
forward and backward rotation mode. This is possible be¬ 
cause the time-scale of the synaptic plasticity is so fast, in 
the order of one second, that the new dynamics is quickly 
propagating into the controller via the plasticity rule. By 
changing a time-scale meta-parameter the frequency of ro¬ 
tation can be adjusted, see Video 10. The spontaneous emer¬ 
gence of the wheel rotation behavior can be argued to be a 
cognitive act if we consider—in the sense of (radical) em¬ 
bodied cognitive science—that cognition is to be described 
in terms of agent-environment dynamics and not in terms of 
computation and representation. 

In another experimental situation, the robot is equipped 


with a brush and forced by manual guidance to wipe a ta¬ 
ble. Video 11 demonstrates how, by the combination of 
the limiting table plane and the manual force, the robot is 
driven into a two-dimensional wiping mode. This is seen to 
slowly wander through different wiping modes by the dy¬ 
namics of the meta-system. Again, the manual interaction 
with the arm by little forces is always possible as seen later 
in the video. This is due to the tight closed loop control 
and the property of the synaptic dynamics to be compliant 
to external perturbances, see Der and Martius (2015). Most 
importantly, emerging motion patterns can be identified and 
stored by the user simply by taking snapshots of the synaptic 
weights. Video 12 shows the recall of previously acquired 
wiping modes. The transition between different modes is 
achieved by hard switching of the fixed controller weights, 
nevertheless smooth transients are observed. 

In summary, we have treated a soft, high-complexity 
robotic system which a novel goal free exploratory con¬ 
trol algorithm. It reverts the role of the controller and 
the controlled and makes a set of non-trivial and highly 
coordinated behaviors emerge solely from the interaction of 
synaptic dynamics, neural transmission and the mechatronic 
system. It provides a systematic approach for behavioral 
self-organization avoiding the reality gap as demonstrated 
by our applications to both simulated and real robots. In this 
way it can help to lift Artificial Life ecologies to a new level 
of complexity approaching physical reality of, say, human- 
robot ecologies. The new controller may speed up evolution 
enormously Der and Martius (2015) as the emergence of 
a new trait needs only a mutation in morphology with 
adequate behaviors coming for free. Also the controller is 
fully deterministic revealing that behavioral proliferation 
can be the result of spontaneous symmetry breaking. Seen 
as a practical approach to generate complex, force-sensitive 
interactions with the environment this controller could also 
augment the repertoire of classical controllers. Additionally, 
it may shed light on how biological musculoskeletal systems 
generate the complex trajectories they use to interact with 
the environment with an unrivalled flexibility—not as a 
heavily controlled process but as an emergent phenomenon. 

Supplementary material: playfulmachines.com/MyoArm-2 
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Abstract 

It has been shown that manipulation of objects by 3D vir¬ 
tual creatures can play an important role in the evolution of 
complex, embodied sensorimotor behaviours. In this work 
we examine the capacity of virtual creatures that use evo¬ 
lutionary and control architectures already shown to be ca¬ 
pable of sensor-differential gradient-following locomotion 
(tropotaxis) to adapt to solve a physical problem involving 
the manipulation of 3D objects in their environments. Specif¬ 
ically, the creatures’ task is to guide a physically-modelled 
cube through their environments in order to achieve maxi¬ 
mum covered distance of the object. Agents were evolved in 
the manipulation environment from random initial genotypes 
and from genotypes previously optimised for performance in 
a different task. Performance was evaluated both before and 
after evolutionary adaptation. We show that the architecture 
achieves embodied feedback control in the block movement 
task. We observed some overlap between the earlier and later 
environments but also that success in the first environment 
does not preclude or entail success in the second. We found 
that species evolving from scratch do no better or worse than 
those optimised for a different environment, and that sensory 
feedback is necessary for correct approach and control be¬ 
haviours in agents, although close control is less dependent 
on sensory input than distance approach. 

Introduction 

The evolution of virtual creatures in physically simulated 
three-dimensional worlds was first demonstrated in 1994 in 
the work of Karl Sims, who first evolved articulated agents 
to swim, walk, jump or follow a light source (Sims, 1994b) 
and then evolved such agents to compete to gain control of 
an object (Sims, 1994a). The diverse range of strategies and 
counter-strategies evolved through the latter task demon¬ 
strated both co-evolution’s ability to generate increasingly 
complex behaviours and that object manipulation can play 
an important role in the evolution of sensorimotor intelli¬ 
gence (beyond mere locomotion and taxis) in simulation, as 
in nature. 

The 3D River Crossing (3D RC) task, first presented in 
Stanton and Channon (2015), provides an ideal base from 
where the evolution of sensorimotor intelligence and related 
issues of physical embodiment can be explored. In that work 
we adapted the shunting model of Grossberg (1988) and 


Yang and Meng (2000), used for the first time in an a-life 
context in Robinson et al. (2007), to build an evolutionary 
environment able to evolve control architectures of 3D vir¬ 
tual creatures that exhibit both reactive and deliberative be¬ 
haviours. However, the problem-solving aspect of the 3D 
RC task in that work was abstracted from the physicality of 
the agent’s morphology. Although each agent’s joint motors 
were driven by some of the outputs from its neurocontroller, 
other neural outputs only notionally represented manipula¬ 
tion of physical objects in the agent’s world. 

An important extension of the earlier work into richer 
interactions is thus to introduce aspects of the deliberative 
problem to the agents’ physical world, requiring an intricate 
manipulation of simulated objects to solve the challenge. In 
this work, we take a first step toward that goal by investi¬ 
gating whether the neural architecture outlined in that work 
can successfully constitute the control system for a simple 
manipulation task: displacement of a physically-modelled 
block in the agent’s world, requiring feedback control, here¬ 
after called the block displacement (BD) task. 

Our general approach is to consider populations of agents 
in a new environment that provides the physical block chal¬ 
lenge. The agents’ neural control systems are sensitised 
to the location of the block by direct interaction with the 
shunting model, simplifying the adaptive problem. We in¬ 
vestigate evolution on the BD task from both random (un¬ 
evolved) populations and from populations of creatures pre¬ 
viously evolved in the 3D RC environment. Hereafter we 
refer to random populations as unevolved populations and 
populations evolved only in the 3D RC environment as naive 
populations. 

Hypotheses 

The objective of this work is the evolution of agents able to 
successfully complete the BD task, as observed through 3D 
visualisation. In addition, we developed and tested the fol¬ 
lowing hypotheses in order to further understand the inter¬ 
actions between the various components of the system and 
explore the limitations of the 3D RC architecture: 

HI. The hybrid architecture is sufficient to achieve feedback 
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control that allows agents to successfully manipulate and 
guide an external object; 

H2. There is some overlap between the earlier 3D RC task 
and the BD task due to the requirement for speedy and 
accurate movement in both environments; 

H3. Species evolved in the 3D RC task show increased per¬ 
formance after evolution in the BD environment (i.e., it is 
possible to optimise this behaviour further), and that suc¬ 
cess in the 3D RC environment does not preclude success 
in the BD environment. 

H4. Some 3D RC species are on evolutionary trajectories 
more suitable for the BD task than others. 

The remainder of the paper presents an overview of the 
method used to generate the agents, and the results of the 
evolutionary and ablative experiments designed to test the 
above hypotheses. We then present conclusions and a dis¬ 
cussion that relates the design of the base system to the ob¬ 
served results. 

Methods 

In this section, we describe how the overall objective of im¬ 
plementing a system capable of using an evolutionary algo¬ 
rithm to produce agents able to manipulate objects in a 3D, 
realistic physics world was achieved. The solution is split 
into three parts. The first part is the design of the evolution¬ 
ary problem that the agent species must evolve to solve; the 
second part documents the abstractions made in the agent’s 
morphology and control architecture that are under the con¬ 
trol of the evolutionary algorithm and the third part describes 
the evolutionary algorithm itself. Finally we describe the 
data collection scheme we use to collect outputs from the 
experiments. 

The Physical 3D RC Problem 

The general problem used in this work is an adaptation of the 
3D RC task described in earlier work (Stanton and Channon, 
2015), following the same key ideas described and used to 
various ends in Robinson et al. (2007) and Borg and Chan¬ 
non (2011). The innovation in this work is the addition of 
a requirement for agents to physically manipulate objects in 
the environment; in our previous work only the body of the 
agent is physically simulated and all environmental interac¬ 
tion is through a two-dimensional, grid-world abstraction. In 
the original RC task, 2-dimensional agents are able to move 
between discrete cells in a 20x20 grid world containing haz¬ 
ards (traps and water ) and resources (,stones and resource. 
Stones can be carried by the agent and placed into water, 
enabling bridges to be built. Success in this environment 
is determined by agents’ ability to avoid hazards and reach 
the single resource by learning an appropriate action policy 
given the current state; this includes capturing an element 
of deliberative planning in order to build bridges in worlds 
containing an otherwise impassable stretch of water. 


This task was extended in our 2015 work to three di¬ 
mensions (the 3D RC task), making the problem signifi¬ 
cantly harder. Agents are embodied in a four-legged fixed- 
morphology physical form that is simulated using a New¬ 
tonian rigid-body mechanics system, meaning that physi¬ 
cal control (principally, the locomotive and orienting be¬ 
haviours required for moving between grid cells) must be 
part of any solution. The agent’s position in 3D is projected 
and quantised to the 2D RC world and any output from the 
control architecture translates directly into motor control in 
the 3D environment. 

This work introduces the Physical 3D RC (P3D RC) task 
where the physical problem is extended beyond the agents’ 
control of their bodies, to the wider environment. Solutions 
to the P3D RC task involve manipulation: in addition to the 
agents’ bodies, a cube representing a stone in the world is 
also physically simulated. Any solution must use physical 
motor control to manipulate the cube into a configuration 
that allows the agent to access the resource objective. 

As a step toward the P3D RC task, we first investigate 
simpler problems where agents must simply move blocks 
around in the world, without the requirement to solve the 
deliberative component of the RC challenge. This paper ad¬ 
dresses the first of these challenges, where the problem is to 
move the environmental block as far as possible. 

As in our earlier work and summarised here, agents have 
a symmetrical quadruped body plan comprising a torso (di¬ 
mension 1.0x1.Ox0.2), four upper limbs and four lower 
limbs (dimensions 0.5 x0.2x 0.2 each). Upper limbs are at¬ 
tached to the torso at each lower corner with a 2-axis con¬ 
straint, limiting the range of motion relative to the torso. 
Knees connect upper libs to lower limbs, constraining their 
relative motion to a hinge. Four small sensors are also mod¬ 
elled in the physical environment as fixed appendages to the 
agent’s torso; this is for convenience of updating sensor val¬ 
ues based on their position and the sensors have no effect 
on the physical operation of the agent. The physical simula¬ 
tor used was Open Dynamics Engine (ODE) version 0.13.1, 
using a fixed timestep of 0.01s, friction pyramid approxi¬ 
mation for contact response (/i = 10.0) between agent and 
the ground plane, universal error reduction (ERP) of 0.2 and 
force-mixing (CFM) of 5 x 10 -5 . In addition to the agents, a 
lxlxl block is simulated at the centre of the environment 
(p = 0.1). On initialisation, agents are randomly positioned 
on a circle with radius 5 units from this point. 

Agent Control 

Given the above problem, a strategy to solve it neces¬ 
sarily requires a control architecture that receives sensor 
data from the environment and produces appropriate mo¬ 
tor stimulation to guide the agent through the challenges 
of the world. We use a bespoke, hybrid neural network 
(HNN) to this end. The HNN comprises feed-forward net¬ 
works for saliency calculation from sensor data, a locally- 
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connected, topologically-organised shunting neural network 
(Yang and Meng, 2000) for modelling the agent’s world, a 
feed-forward bridge between this model and the motor con¬ 
trol parts of the architecture and a series of recurrent leaky 
integrator networks in the style of Beer’s Continuous-Time 
RNNs (Beer and Gallagher, 1992) that actually produce mo¬ 
tor output from the control system. These components we 
label the decision network (DN), the shunting model (SM), 
the physical network (PN) and the pattern generators (PG). 
Together, these components are able to successfully solve 
the 3D RC task, as demonstrated in Stanton and Channon 
(2015). 

Since the details of this hybrid architecture are elucidated 
in previous work, we present a only a summary of the ar¬ 
chitecture below, along with notes on aspects that have been 
modified for the present work. See figure 1 for a detailed 
exposition in graphical form. 

In our 2015 work, the DN and SM follow the ideas pre¬ 
sented in Robinson et al. (2007) closely. Together and prop¬ 
erly configured, they provide a neural-like encoding of a 
fixed action policy relating current state (position, local ob¬ 
jects and carrying state) to action (preferred movement di¬ 
rection, and a pick-up or put-down action). In the first part of 
the present work, the focus is on physical performance rather 
than the species’ capacities to learn an appropriate state- 
action policy. As such, we hard-code ^-values (saliency val¬ 
ues) for objects in the agents’ worlds rather than learn ap¬ 
propriate weights in the DN. 

The PN controls the agent’s behaviour in the world. The 
state transition landscape produced by the agent’s SM is 
sampled at four points physically located on the agent’s body 
and these values are used as input to this network. Thus, 
information about desirable state transitions (in this model, 
directions to move) is available to the PN and can be used 
by agents to discriminate important features of the preferred 
state configuration relative to the agent’s configuration. The 
agent’s configuration can then be updated to climb the gra¬ 
dient in the state space. 

Actual control of the agent’s body to achieve this recon¬ 
figuration is mediated by the PG network. The network is 
an array of five three-neuron oscillator circuits, comprising 
simple leaky-integrator neurons governed by a set of cou¬ 
pled differential equations, modelled after those of Reil and 
Husbands (2002). The PG network receives input from the 
PN that perturbs the oscillating cycles which in turn affects 
the agent’s behaviour in the world. The oscillator circuits 
are a given abstraction in the agents’ design, generated by a 
pre-evolutionary phase that is documented in previous work 
and summarised in the next section. 

Last, outputs from this network are used as target angles 
for the various joints in the agent’s body; actual torques are 
applied according to a proportional-derivative (PD) equation 
based on the difference between the current and desired an¬ 
gles at the joint. 


Grid-world Environment 





x-y agent sensor 
locations (continuous) 
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Outputs to 
joint motors 


Figure 1: Neural architecture. The agent’s 3D world, con¬ 
taining the agent and the block, is discretised into a 2D grid 
(1 and 2; cell-width is 1 unit in the physical model); grid 
locations are given values where salient objects exist and 
this is used to generate the diffusive shunting model (3). 
Agents sample the landscape (4) at four different continu¬ 
ous positions given by their four sensors (5) by interpola¬ 
tion of values around the sensor location (6). These val¬ 
ues pass through a feed-forward network and affect the dy¬ 
namical trajectories of pattern generators (7) that ultimately 
output values to effectors via weighted links to joint motors 
(8). Links shown in red are subject to evolutionary optimisa¬ 
tion, both in the pre-evolutionary phase and in the later block 
task. This includes the red region around the five preset pat¬ 
tern generators whose interneuron weights are also variable: 
within a single generator preset weights are adapted; across 
generators weights are initialised at zero but can also move 
from this value. 
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Evolutionary Algorithm 

Pre-evolution As noted above, populations exploring the 
block task have been pre-evolved in other environments and 
also contain specific neural circuits that were produced in an 
additional, separate environment. These circuits were pro¬ 
duced in isolation: three-neuron motifs were evaluated for 
their capacity to stably generate a 1Hz sinusoidal oscillation 
in the presence of an input signal and to be quiescent other¬ 
wise using an objective function based on the Fourier trans¬ 
form of their output over a 10-second window. The major 
pre-evolutionary phase involved the simulation of 20 species 
of agent in the original 3D RC environment. These species 
progressed through the documented incremental evolution¬ 
ary phases of food collection, sprinting and hazard avoid¬ 
ance; the evolutionary process was halted before the deliber¬ 
ative part of the incremental challenge. (Specifically, agent 
populations were allowed 250k 3-individual tournaments; it 
was found that all 20 species had progressed to the delibera¬ 
tive component by this point.) The 20 species, all capable of 
tropotactic locomotion, were then installed in the block envi¬ 
ronment. All evaluation was carried out using a bespoke dis¬ 
tributed evaluation system across approximately 200 CPU 
cores, achieving approximately 100 evaluations per minute. 

Evolutionary Parameters In all cases the evolutionary al¬ 
gorithm is a three-individual tournament selection-based 
optimisation process, operating on a population of 150 
genomes. Individuals’ neuro-controllers are represented as 
an array of floating-point values. On reproduction, single¬ 
point crossover occurs between the two winning individuals 
in the tournament, and Gaussian mutation is applied to alle¬ 
les of the resulting child genome with probability 1 /l, where 
fi = 0 and a = 1. 

Objective Functions For the pre-evolution of oscillator cir¬ 
cuits, the objective function was the number of non-1 Hz fre¬ 
quencies in the frequency domain of a ten-second sample of 
the output neuron’s signal in the input-high state, and the 
total number of frequencies in the input-low state. During 
the pre-evolution of gradient-ascending virtual creatures, the 
objective was as defined in (Stanton and Channon, 2015); 
agents of high fitness completed many of the incremental 
stages of the 3D RC task. For the evolved block-pushing 
task, the objective is to maximise the distance covered by 
the block in the discrete grid-world. 

Data collection 

To examine HI, we collected observations of agent be¬ 
haviour, including extracting trajectory data from the 
highest-scoring individual from the BD task under various 
sensory ablation conditions. Deafferentation of control in¬ 
puts was achieved by systematically disabling sensors, and 
the agent’s progress in a controlled version of the BD task 
was recorded across a two minute time interval. For each ab¬ 
lation, we examine approach and control. In both cases the 
block is positioned at (20,20); for approach the agents start 



Figure 2: Visualisation of a single agent in the block- 
displacement world. Agent is displaying a low, heavy gait 
suitable for block pushing. 


far from the block at (5,5), and for control they start very 
close at (18,20). The trajectories followed by agents and 
block in the two scenarios illuminate the dependence of the 
gaits on sensory feedback. To examine H2, we used mean 
evolutionary performance data from the final 1000 tourna¬ 
ments of the 3D RC pre-evolution phase in comparison to 
the mean score of the same species in the BD task, evaluated 
for 2 minute and 10 minute periods (simulation time). To ex¬ 
amine H3, we measure the naive BD score before evolution 
takes place of each individual in each population, in 10 ran¬ 
domly initialised trials. Each trial evaluates the individual 
for 10 minutes in the BD task. After the evolutionary phase, 
we repeat the process. We also collected evaluation data for 
each individual in each of 20 populations over 10 trials of 
10 minutes each, after 100k tournament evolution when in¬ 
dividuals begin with random genotypes. To examine H4, we 
use the evaluation fitnesses for naive and evolved species. 

Results 

HI: The hybrid architecture is sufficient to achieve 
feedback control that allows agents to successfully ma¬ 
nipulate and guide an external object Visualisations of 
agent behaviour can be seen at https://youtu.be/ 
gZaUvXcdMK8, and figure 2 provides a static view of an 
agent. The zoopraxiscopic figures (in the style of Eadweard 
Muybridge) show a time-series of snapshots that illustrate 
how agents approach the block from a distance (figure 3a), 
and manipulate the block in their world (figure 3b). The 
sensory ablation data are presented in figure 8. Figure 8a 
shows the planar trajectory followed by agents approaching 
the block from a distant point under various deafferentation 
conditions; figure 8b shows the response of agents to the 
same sensory culling in a closer, control scenario. 

H2: There is some overlap between the 3D RC task and 
the BD task due to the requirement for speedy and accu¬ 
rate movement in both environments. A non-parametric 
correlation analysis was undertaken between the species’ 
relative ranks for mean fitness during the final 1000 tour¬ 
naments of the 250k-tournament 3D RC pre-evolutionary 
runs and the mean score on the BD task. Figure 4 presents 
this correlation graphically for both two minute and ten 
minute evaluation times. In the 10m trial we found a statis- 
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(a) Approach gait. The agent is moving toward the block from a distance. All limbs are contributing to the movement. 










(b) Control gait. The agent is pushing forward with its ‘back’ limbs, maintaining the block between its forelimbs. 

Figure 3: Zoopraxiscopic diagrams that show the gaits of the best evolved agent (run 11, individual 105). Presentation is in 
natural reading order, left-to-right, top-to-bottom. The viewpoint is fixed but tracks the agent as it moves through the world. 


tically significant although weak correlation (p = 0.38; Ho 
p < 0.05). The correlation between 3D RC and BD perfor¬ 
mance in the 2m BD trial is much stronger (p = 0.51; Ho 
p < 0.05). 

H3: Species evolved in the 3D RC task show increased 
performance after evolution in the BD environment. 

There is a clear improvement in all cases over the 25k tour¬ 
nament evolutionary run: the mean fitness over all naive 
populations was 37.29, compared to 124.16 in the evolved 
set (Ho p < 10 10 ). Figure 5a shows progress of runs begin¬ 
ning from random genotypes over evolutionary time (100k 
tournaments in lk tournament averages). Figure 5b shows 
the same view of populations beginning from naive geno¬ 
types, over 25k tournaments. Both treatments show a level¬ 
ling off of fitness and there is no significant difference be¬ 
tween the evaluation performance of the two starting condi¬ 
tions (figure 6). 


H4: Some 3D RC species are on evolutionary trajectories 
more suitable for the BD task than others. We found a 
correlation between naive score and evolved scores across 
the 20 species (p = 0.59; Ho p < 0.01) but no correlation 
between the naive scores and the magnitude of the change in 
fitness (p = 0.26; Hq p > 0.1). Figure 7 demonstrates these 
relationships: ^-fitness is uncorrelated with naive fitness. 

Conclusions and Discussion 

We have shown that feedback motor control in evolved 
agents is possible with the given architecture, and that the 
architecture is flexible enough to support and adapt to a va¬ 
riety of evolutionary scenarios presented sequentially. This 
demonstrates that the platform has the potential to support 
environments that require even more sensorimotor control 
and is a reasonable starting point from where physical com¬ 
plexities can be added into the 3D RC task, eventually ap- 
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Figure 4: Across-species correlation comparing 3D RC per¬ 
formance and BD performance. Outcomes across the two 
tasks are more correlated when evaluation time is shorter 
(p = 0.51), indicating that movement speed is a factor in 
success in the block task and shared between the two prob¬ 
lems. However, a strong gait is required to push the block 
and this is not selected for in the 3D RC task, hence the 
lesser correlation in the 10m task (p = 0.38). 


proaching a full physical model of the problem. Obser¬ 
vations of the agents’ behaviour gained through 3D visu¬ 
alisation have revealed a rich variety of evolved strategies 
for solving the problem. Different classes of gait for ap¬ 
proaching and manipulating the block appear due to the ge¬ 
netic heritage of species, and it is clear that low, heavy gaits 
work best for pushing the object in the BD task. From the 
deafferentation studies it can be seen that these gaits are not 
self-generating, blind gaits that simply aim the agent to the 
target location but are more complex aggregates of sensory 
data that depend on the agents position relative to the block 
in order to successfully achieve increased displacement. 

When we consider the two evolutionary scenarios, 3D RC 
and BD, we found some overlap between the two problems. 
A strong correlation was observed between performance in 
the BD challenge before evolution in a two-minute evalua¬ 
tion, and performance at the end of the 3D RC task, indi¬ 
cating that some components of both challenges contribute 
similarly to relative agent fitness. This is likely to be the 
speed and directness of movement in the world which has a 
greater effect in a smaller evaluation period. As the evalua¬ 
tion period grows larger, this correlation decreases indicat¬ 
ing that the block-pushing dimension of fitness in this sce¬ 
nario is not well captured in the 3D RC task and ultimately 
is the most important component. (It was also observed by 
measuring the time taken by agents to reach the block that 
most naive species sacrifice movement speed for block push¬ 
ing capability during evolution, and this aspect should be in¬ 
vestigated more thoroughly to determine whether this is an 
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(a) Fitness on the BD task (moving average over a 1000- 
tournament moving window) for evolution from a random (un¬ 
evolved) population. 
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(b) Fitness on the BD task (moving average over a 1000- 
tournament moving window) for evolution from a naive (evolved 
in 3D RC) population. 

Figure 5: Progress of runs over evolutionary time; note that 
the x-axis differs due the different starting conditions and 
number of tournaments for each treatment. 


artefact or a consistent trend.) We showed that performance 
from either starting point (3D RC or unevolved genotypes) 
is comparable, demonstrating that an incremental approach 
incorporating both types of environment is possible in prin¬ 
ciple. We noted one extremely high-fitness run in the ran¬ 
dom category; upon visual inspection this species is a clas¬ 
sic degenerate solution whose strategy is to rapidly vibrate 
the block to achieve high fitness. It is possible that more 
complex environments (such as 3D RC) prevent this kind 
of trivial solution by requiring a richer agent-environment 
interface. Our results comparing BD fitness before and af¬ 
ter evolution demonstrate that whilst naive performance is 
an indicator of final performance, it is not an indicator of 
how much any particular species will improve. There is a 
risk that incrementally presenting new environments to only 
the most successful species could exclude good general so¬ 
lutions, a problem potentially mitigated by heterogeneous 
presentation of multiple environments. 


149 




















Sort&d Osst Iridividuats Per Treatment 


R^Lgi'iur^sliiplifli^S'BhniLgwe, evqlvrtf *nd dvlla tcer*! 


J50 



~*~ EvdIvc-e*! Ifdit* rand-am milial gr natypei 
^“N.iivl 1 ligcnli Tram RC !:r.-tc 
-*- EvdIvl'-i? IjcmnaWc agcnSSi 


Figure 6: Comparison of the best individuals from the naive 
population, and from populations evolved from the random 
(unevolved) and naive-evolved populations. 

Further work Ongoing work is toward the P3D RC task: 
a physically-embodied deliberative river crossing problem. 
The next step is to consider not just displacement but also 
positioning of the block using the shunting landscape. This 
is likely to demand significant revision of the underlying 
control architecture to incorporate reasoning about relative 
positioning. Additionally, the question of whether specific 
types of solutions in the 3D RC world have specific perfor¬ 
mance profiles in the BD world could be addressed by ex¬ 
amining in detail whether some species always slow down 
and some always speed up. Additionally, it is possible that 
evolved morphology could significantly contribute to physi¬ 
cal manipulation behaviours. 
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(a) Approach task. The agent begins at (5,5) and attempts to reach the block. The unaltered agent’s trajectory is shown in the top left; this 
agent tends to overshoot its target and then correct by rotating, as the two loops in the path record. All sensors have some effect on this 
behaviour although sensor 1 is by far the most pronounced difference in a single cut. In complete deafferentation (bottom right) the agent 
moves randomly. 
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(b) Control task. The agent begins at (18,20), adjacent to the block. The unaltered agent pushes the block in a tight circle to maximise fitness 
(top left). Sensor ablations do not have a catastrophic effect as in the approach task; all single cuts still maintain block movement although 
the trajectory is less efficient, as does the dual cut of sensors 1 and 2. Only by cutting sensors 3 and 4 or complete deafferentation did we 
observe failure to displace the block at all. 


Figure 8: Agent-block trajectories of best agent from best overall trained population under various sensor ablation treatments. 
The figure demonstrates how a combination of sensory inputs is necessary for reliable gait generation for distance approach 
and to control the block. In all cases the block initially rests at (20,20). 
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Abstract 

In this article, we are intested in the evolution of speciali¬ 
sation among a single population of heterogeneous robotic 
agents in a cooperative foraging task. In particular, we want 
to compare (1) the emergence and (2) fixation of genotypic 
polymorphism under two different selection methods: elitist 
and fitness-proportionate. We show that, while the emergence 
of specialists is easy under an elitist selection, this method 
cannot maintain heterogeneous behaviours throughout the 
whole simulation. In comparison a fitness-proportionate al¬ 
gorithm proves to be inefficient in evolving any cooperative 
strategy but ensures the conservation of heterogeneity when 
it is present in the population. We then reveal through addi¬ 
tional experiments two key factors for the evolution of het¬ 
erogenous behaviours in our task: (1) protection of genotypic 
diversity and (2) efficient selection of partners. We finally 
demonstrate this assertion and, while our main problem re¬ 
mains unsolved, we provide directions on how it could be 
successfully approached. 

Introduction 

Task specialisation is a defining characteristic in achieving 
efficient coordination and is thus considered to be crucial 
in the evolution of complex cooperative behaviours (Sza- 
thamary and Maynard Smith, 1995). The problem of evolv¬ 
ing cooperation has been largely studied in evolutionary 
robotics as it raises interesting persepectives for the design 
of collective robotics (Trianni et al., 2007; Hauert et al., 
2010; Doncieux et al., 2015). As a consequence, the man¬ 
ner in which robotic agents could evolve specialisation (or 
division of labour) for a cooperative task represents a com¬ 
pelling challenge in evolutionary robotics. As such, a large 
body of litterature has already been dedicated to this subject. 
However, most research focus on the particular case of ho¬ 
mogeneous groups of individuals (Waibel et al., 2009) as is 
classic in evolutionary robotics. This means that the indi¬ 
viduals are forced to rely on phenotypical plasticity (Waibel 
et al., 2006; Ferrante et al., 2015; Eskridge et al., 2015) 
and/or environmental cues (Waibel et al., 2006; Goldsby 
et al., 2010) in order to achieve specialisation. 

In this paper, we focus on a slightly different problem: 
the evolution of a polymorphic population where division of 


labour is encoded at the genotypic level. More precisely, 
we want to study the evolution of a population containing 
two (or more) different types of genotypes. Each of these 
types of genotype should be able to encode for a differ¬ 
ent role without requiring the addition of mechanisms for 
lifetime specialisation. Thus it poses the problem of both 
evolving and maintaining genotypic polymorphism in a sin¬ 
gle population. Here we want to investigate the conditions 
under which specialised behaviours for a cooperative task 
can evolve in a single population of heterogeneous individ¬ 
uals. In particular, we are interested in the influence of the 
selection process in achieving division of labour. 

We design a 2-robots cooperative foraging task where 
both a solitary and a cooperative strategies can evolve but 
where cooperation is highly rewarded. The genotype of each 
robotic agent is separately chosen in the population and the 
individuals therefore form an heterogeneous group. This 
task is greatly favored by the evolution of efficient coordina¬ 
tion strategies. In particular, our previous work on a similar 
task (Bernard et al., 2015) showed that two types of coop¬ 
erative strategy could evolve: one where both individuals 
adopt homogeneous behaviours (generalists) and the other 
one where they adopt a leader/follower strategy (specialists). 
Moreover, it was shown that the latter could only emerge be¬ 
tween heterogeneous individuals. As it is also the more effi¬ 
cient behaviour, we study the conditions for its emergence. 
The evolutionary dynamics of two popular selection meth¬ 
ods are studied: (1) an elitist (/i + A) evolution strategy and 
(2) fitness-proportionate selection. Fitness-proportionate in 
particular is interesting with regards to genotypic polymor¬ 
phism as it is known to allow the evolution of frequency- 
dependent selection (Altenberg, 1991). 

In the next Section, we introduce the experimental setup. 
Then we present the two types of cooperative strategies that 
can evolve. Next, we investigate whether any of the se¬ 
lection methods could evolve heterogeneous behaviours. In 
particular, we study for both schemes the evolutionary out¬ 
comes depending on whether the population is initially con¬ 
stituted of random individuals or seeded with pre-evolved 
efficient specialists. Then we present the results of corn- 
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putational analyses in order to reveal and understand more 
deeply the mechanisms at play. In a final experiment, we 
reveal key mechanisms which could be investigated to solve 
this problem. Finally we discuss our findings and shed light 
on interesting perspectives for future work. 

Methods 

We evaluate two robotic agents in a 800 by 800 units square 
arena devoid of any obstacles except for the foraging targets. 
At the beginning of a simulation, 18 targets are randomly 
positioned in the environment. While the agents may move 
freely in the arena, the targets’ positions are fixed. For a 
target to be collected, any agent needs to stay in contact with 
it for a specified amount of time (800 simulation steps). The 
target is removed after this duration and put back at another 
random position so that the number of targets is kept the 
same throughout a simulation. We consider that cooperative 
foraging happens if both individuals are in contact of the 
target when it is removed. When an agent collects a target, 
it is rewarded 50 if this target has been foraged in a solitary 
manner or 250 if both agents have cooperated to collect it. 

Each agent is circular-shaped with a diameter of 20 units 
and possesses a collection of different sensory inputs. The 
first type of inputs is a 90 degrees front camera and is com¬ 
posed of 12 rays, each one indicating the type and distance 
to the nearest object (either another agent or a target). The 
other type of inputs are 12 proximity sensors evenly dis¬ 
tributed around the agent’s body. With a range of twice the 
agent’s diameter, each proximity sensor outputs the proxim¬ 
ity of the nearest obstacle in its range. 

Both agents begin the simulation next to each other at the 
same end of the arena and can move according to the out¬ 
puts of their neural network. This neural network is a fully 
connected multi-layer perceptron with one hidden layer. The 
inputs of the neural network are comprised of all the sensory 
information of the agent, i.e. 36 input neurons for the cam¬ 
era (3 inputs for each ray) and 12 for the proximity sensors. 
A final input neuron whose value is always 1 is used as a bias 
neuron. This amounts the total number of input neurons to 
49. The hidden layer is constituted of 8 neurons while the 
2 neurons of the output layer return the speed of the agent’s 
wheels. A sigmoid is used as the activation function of each 
neuron. Finally, the topology of the network is kept constant 
during the experiments. 

The population of individuals is evolved thanks to a clas¬ 
sical evolutionary algorithm. The genotype of each individ¬ 
ual is constituted of a collection of the 410 real-valued con¬ 
nection weights of the neural network. At each generation 
of the algorithm, every individual is evaluated by being suc¬ 
cessively paired with another individual randomly chosen in 
the population 5 times. Each pair interacts in the setting pre¬ 
sented before during 20000 simulation steps which we call a 
trial. We perform 5 trials for each pair of individuals in or¬ 
der to decrease the impact of the targets’ random positions 


on the individuals’ performance. The fitness score of an in¬ 
dividual is computed as the average reward per trial. 

The population for the next generation is created accord¬ 
ing to two different selection schemes : 

• (/i + A) elitist selection: the population of the next gen¬ 
eration is constituted of the /jl best individuals from this 
generation and A offsprings sampled from the best indi¬ 
viduals. 

• Fitness-proportionate: offsprings are randomly sampled 
from the current generation to constitute the population of 
the next generation. The probability to sample a particular 
parent is proportional to this parent’s fitness score. 

Regardless of the selection method used, every offspring 
is a mutated clone of its parent and no recombination is 
used in our algorithm. The probability for each gene to 
mutate is 5 x 10 _3 and mutations are sampled according to 
a gaussian operator with a standard deviation of 2 x 10 -2 . 
Finally, experiments were conducted with the robotic 2D 
simulator of SFERESv2 (Mouret and Doncieux, 2010), a 
framework for evolutionary computation. You can find the 
source code for the experiments available for download at 
http://pages.isir.upmc.fr/~bredeche/Experiments/ALIFE2016- 
specialisation.tgz. 

Behaviours of Specialists in a Cooperative 
Foraging Task 

We showed in a previous article (Bernard et al., 2015) that 
two cooperative strategies could evolve in this particular 
task: turning (between two turners ) and leader/follower (be¬ 
tween a leader and a follower). Both of these strategies 
achieve cooperative foraging but with varied efficiency. 

In the turning strategy, both individuals turn around one 
another so that they can keep the other individual in their 
line of sight and stay close to it (see Figure 1(a)). At the 
same time, the two individuals try to get closer to a target. 
This way, as soon as one of the two individuals is in contact 
with a target, the other individual can join it so the target 
may be collected cooperatively. Consequently, both individ¬ 
uals adopt a similar behaviour in this strategy and can be 
described as generalists. 

In the leader/follower strategy, the individuals specialise 
in two roles: a leader and a follower. The leader always 
gets on the target first and checks rarely for its partner. In 
comparison, the follower tries to keep its leader in view dur¬ 
ing the entirety of the simulation so that it can get on the 
same target (see Figure 1(b)). Consequently, we observe the 
expression of two clearly heterogeneous behaviours which 
implies that both individuals are specialists. More impor¬ 
tantly we also showed that, given our agents’ capabilities, 
each phenotype needed to be encoded by a different geno¬ 
type for specialisation to happen. 
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Figure 2: Average reward and leadership proportion with a leader/follower or turning strategy Boxplots of (a) the average 
reward and (b) the leadership proportion over 20 independent trials for the leader/follower and turning strategies. The leadership 
ratio of an individual represents the propensity for one individual among the pair to arrive first more often than its partner on a 
target collected in a cooperative fashion. The position of each target at the beginning of each trial was randomized. 




Figure 1: Snapshots of the simulation after an entire trial 
in the foraging task. The path of each robotic agent from 
their initial positions (black dots) is represented in red and 
blue. The blue discs represent the 18 targets in the envi¬ 
ronment. When a target is foraged by the two agents, a red 
cross (resp. blue) is drawn on the target if the red agent 
(resp. blue) arrived on it first. Each snapshot corresponds to 
a trial where agents adopted a different strategy: (a) turning 
or (b) leader/follower. 

Figure 2(a) shows the efficiency of each strategy, defined 
as the average reward obtained by the two individuals dur¬ 
ing a simulation over 20 independent trials (with random¬ 
ized targets’ positions for each trial). We can see that, as ex¬ 
pected, the leader/follower strategy achieves a significantly 
higher efficiency (Mann-Whitney U-test on the average re¬ 
ward over 20 trials, p-value < 0.0001). This difference in 
efficiency is directly correlated to a highly significant differ¬ 
ence in the proportion of leadership as shown in Figure 2(b) 


(Mann-Whitney U-test on the leadership proportion over 20 
trials, p-value < 0.0001). We compute this proportion by 
looking at the propensity for one of the two individuals to 
arrive first more often on a target foraged cooperatively (i.e. 
the emergence of a leader). 

Evolving Heterogeneous Behaviours with an 
Elitist Selection 

Bootstrapping leader/follower strategies 

In this first experiment, we are interested in the emergence of 
a leader/follower strategy when starting with a population of 
random individuals under an (/i + A) elitist selection. In or¬ 
der to investigate the influence of population size, we tested 
three different sizes N\ 20, 40 and 100. For each population 
size, we conducted 11 independent runs, each one lasting 
90000 evaluations. For each population size N, we defined 
H (i.e. the number of parents) and A (i.e. the number of off¬ 
springs) as y. For example, when population size was 100, 
50 individuals were kept from the previous generation and 
used to create 50 mutated offsprings. 

Table 1 shows the repartition of the best individuals’ 
strategies at the last generation of evolution for each popu¬ 
lation size. We consider a behaviour to be cooperative when 
more than 50% of the total number of targets collected are 
foraged cooperatively. First, we observe that in every repli¬ 
cate individuals always end up evolving a cooperative strat¬ 
egy. We also see that evolving a leader/follower strategy 
is difficult as specialists evolve in only 1 run (out of 33) 
and when the population size is 100. These results suggest 
that it is nearly impossible to evolve such heterogeneous be¬ 
haviours with this setting. 
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Pop. 

size 

# L/F 
Strat. 

# Turning 
Strat. 

# NC 
Strat. 

Total 

20 

0 

11 

0 

11 

40 

0 

11 

0 

11 

100 

1 

10 

0 

11 


Table 1: Strategies evolved by the best individuals un¬ 
der elitist selection with an initially random population. 

Repartition of the different strategies adopted by the best in¬ 
dividuals at the last evaluation in each of the replicates for 
different population sizes N. We indicate in each cell the 
number of simulations where a particular strategy evolved. 
Populations were evolved under an (/i + A) elitist selection, 
with fi = y and A = y. Individuals’ genotype values were 
intially random. In the table ”L/F” stands for leader/follower 
and ”NC” for ’’Non-Cooperative”. 

However, when looking at the whole evolutionary history 
we can reveal additional information about the evolution of 
specialists. We show in Figure 3 the proportion of evolu¬ 
tionary time when the best individual of each run adopted a 
leader/follower strategy. This value is computed as the ratio 
of the number of generations when the leadership ratio was 
high enough (over a threshold value of 0.6) out of the to¬ 
tal number of generations. We observe that even if the best 
individuals end up adopting a generalist strategy, this was 
not the case during the entirety of the evolution. In partic¬ 
ular, there is a significant increase (Mann-Whitney, p- value 
< 0.05) in the number of generations where the best indi¬ 
vidual showed a leader/follower strategy when population 
size was 100 compared to a population size of 20. Therefore 
this implies that it is possible to evolve specialists but their 
stability in the population over time is nearly impossible to 
achieve. 

Maintaining heterogeneity in a population seeded 
with specialists 

In order to investigate the lack of stability of genotypic poly¬ 
morphism under elitist selection, we design another experi¬ 
ment. We separately evolve a population of efficient leader 
individuals and follower individuals beforehand. We then 
replace the worst individuals w.r.t. fitness score in the popu¬ 
lation of leaders by a certain amount of followers. Our goal 
is to study if artificially constructing such population could 
result in the invasion and fixation of a stable leader/follower 
strategy. 

The number of followers initially inserted in the popula¬ 
tion was varied according to two different settings: (1) we 
add only one follower or (2) we add an amount of followers 
equal to half of the population. Experiments were replicated 
11 times during 90000 evaluations with population size of 
40 and 100. 

We show (Table 1) no significant differences in compari¬ 
son to simulations with a population constituted of initially 


0.8 



Population size 


Figure 3: Proportion of time with a leader/follower strat¬ 
egy. Boxplots of the number of generations where the best 
individual in each replicate adopted a leader/follower strat¬ 
egy out of the total number of generations. We consider that 
the best individual adopted a leader/follower strategy when 
its leadership ratio was over a threshold value of 0.6. 

random individuals w.r.t. the number of simulations where 
a leader/follower strategy evolved. These results suggest 
that even when purposely adding specialists, their stability 
in the population is still very hard to achieve. This implies 
that whether the behaviours are evolved from random geno¬ 
types or bootstrapped with efficient individuals is not as im¬ 
portant as maintaining heterogeneity in the population. In 
particular, in only one replicate among the 3 runs where a 
leader/strategy was eventually adopted (out of 44) did the 
specialists initially added were maintained. In the 2 other 
runs we observe multiple emergences and disappearances of 
specialists throughout evolution. 

Evolution Under a Fitness-Proportionate 
Selection 

In this next experiment we want to investigate the evo¬ 
lution of heterogeneous behaviours when using a fitness- 
proportionate selection. As fitness-proportionate is known 
to allow frequency-dependent selection, we hypothesize that 
it may facilitate the evolution of specialists. 

Bootstrapping leader/follower strategies 

Similarly to the elitist selection, we replicated our exper¬ 
iments in 11 independent runs during 90000 evaluations. 
Likewise, population sizes were 20, 40 and 100. 

We show in Table 3 that results are highly different when 
using such selection scheme. In particular, the fitness- 
proportionate selection performed poorly w.r.t. evolving co¬ 
operative strategies. For each population size, no coopera¬ 
tive strategy evolved at all in the vast majority of replicates. 
However in one particular run we do observe the emergence 
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Pop. 

size 

Followers 

added 

# L/F 
Strat. 

# Turning 
Strat. 

# NC 
Strat. 

Total 

40 

1 

0 

11 

0 

11 

40 

20 

0 

11 

0 

11 

100 

1 

1 

10 

0 

11 

100 

50 

2 

9 

0 

11 


Table 2: Strategies evolved by the best individuals un¬ 
der elitist selection when adding followers. Repartition of 
the different strategies adopted by the best individuals at last 
evaluation in each of the replicates for different population 
sizes N. We indicate in each cell the number of simula¬ 
tions where a particular strategy evolved. Populations were 
evolved under a (/x + A) elitist selection, with fi = y and 
A = y. The population was initially seeded with a pop¬ 
ulation of leaders in which we added a specific amount of 
followers. In the table ”L/F” stands for leader/follower and 
”NC” for ’’Non-Cooperative”. 


Pop. 

size 

# L/F 
Strat. 

# Turning 
Strat. 

# NC 
Strat. 

Total 

20 

0 

1 

10 

11 

40 

0 

1 

10 

11 

100 

1 

2 

8 

11 


Table 3: Strategies evolved by the best individuals un¬ 
der fitness-proportionate selection with an initially ran¬ 
dom population. Repartition of the different strategies 
adopted by the best individuals at the last evaluation in each 
of the replicates for different population sizes. We indi¬ 
cate in each cell the number of simulations where a par¬ 
ticular strategy evolved. Populations were evolved under a 
fitness-proportionate selection. Individuals’ genotype val¬ 
ues were initially random. In the table ”L/F” stands for 
leader/follower and ”NC” for ’’Non-Cooperative”. 

and fixation of specialists. This is similar to what was ob¬ 
served under elitist selection w.r.t. evolving specialists. 

Yet a closer look at the dynamics of evolution under a 
fitness-proportionate selection yields interesting results. In 
particular, there is not much variation in the strategy adopted 
by the best individuals throughout evolution. This is consis¬ 
tent with the fact that the bootstrap of a cooperative strat¬ 
egy was not observed in most of the replicates: fitness- 
proportionate is not efficient in evolving any cooperative be¬ 
haviour. In consequence, there is not much variation in the 
proportion of individuals adopting a leader/follower strategy 
during evolution. As a matter of fact, we observe that in the 
only replicate where there was genotypic polymorphism at 
the end of the simulation, specialists were already present at 
the random initialisation of the population and did not evolve 
through mutation. This is very different with the elitist se¬ 
lection where we observe multiple emergences of specialists 


(even briefly) during evolution in many different runs. 

Maintaining heterogeneity in a population seeded 
with specialists 


Pop. 

size 

Followers 

added 

# L/F 
Strat. 

# Turning 
Strat. 

# NC 
Strat. 

Total 

40 

1 

7 

0 

4 

11 

40 

20 

8 

0 

3 

11 

100 

1 

10 

0 

1 

11 

100 

50 

10 

0 

1 

11 


Table 4: Strategies evolved by the best individuals un¬ 
der fitness-proportionate selection when adding follow¬ 
ers. Repartition of the different strategies adopted by the 
best individuals at the last evaluation in each of the replicates 
for different population sizes N. We indicate in each cell the 
number of simulations where a particular strategy evolved. 
Populations were evolved under a fitness-proportionate se¬ 
lection. The population was initially seeded with a popu¬ 
lation of leaders in which we added a specific amount of 
followers. In the table ”L/F” stands for leader/follower and 
”NC” for ’’Non-Cooperative”. 

As expected from previous results, fitness-proportionate 
performs well in terms of stability of heterogeneous be¬ 
haviours. We show in Table 4 that in the majority of repli¬ 
cates the best individuals adopt a leader/follower strategy at 
the end of the simulations. This is particularly true when 
population size is high enough (100). A major difference 
with the elitist selection is that in all replicates where a 
leader/follower strategy was observed at the end of the run, 
the specialists were maintained from the start throughout 
evolutionary time. These results suggest that, although not 
efficient at bootstrapping cooperative behaviours, fitness- 
proportionate performs well w.r.t. the stability of genotypic 
heterogeneity. Furthermore, we can hypothesize that this se¬ 
lection scheme is good at maintaining heterogeneity specif¬ 
ically because it largely fails (under our choice of parame¬ 
ters) at bootstrapping any cooperative strategy. 

Computational Analyses of Population 
Dynamics 

In this present section, our goal is to understand more deeply 
the dynamics at play which allow the invasion of subopti- 
mal generalists even when efficient specialists are present. 
To that end we run computational analyses based on the 
expected fitness of each of the three phenotypes. Table 5 
shows the average payoff of pair-wise simulations between 
each type of phenotypes. We consider the payoffs for both 
phenotypes in each pair to be identical as no significant dif¬ 
ferences were observed between their payoffs. 

Several observations can be made directly from these re¬ 
sults. First, we can confirm that the leader/follower strategy 
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Phenotype 

Leader 

Follower 

Turner 

Leader 

1265 

5000 

3480 

Follower 

5000 

100 

2750 

Turner 

3480 

2750 

2755 


Table 5: Payoff matrix for pair-wise simulations of each 
phenotype. Average payoffs of each phenotype against ev¬ 
ery phenotype in a pair-wise simulation. Each pair was eval¬ 
uated 10 times in order to decrease the stochastic effects of 
the initial conditions (i.e. random positions of the targets). 

displayed by a {leader, follower) pair is clearly the best strat¬ 
egy. However each one of these two phenotypes performs 
very poorly against itself with the worst payoff obtained by 
a pair constituted of two followers. Secondly, turner individ¬ 
uals perform also very well against leaders. Last, there is no 
significant differences w.r.t. payoffs when a turner is paired 
with a follower or another turner. These last two points hint 
at a shared lineage between followers and turners. 

Indeed analyses of the genotypes’ histories in our previ¬ 
ous experiments reveal that turner individuals in fact de¬ 
scend from follower individuals. This means that they act 
as followers when interacting with leaders but are not as ef¬ 
ficient. However they are a lot more efficient than followers 
when paired with individuals of the same phenotype (or fol¬ 
lowers). 

From this payoff matrix, we run computational analyses 
to model the gradient of phenotypes’ repartition in an infinite 
population. The fitness W of a particular phenotype i is 
computed as follows: 

M 

W i = Y J Pi}j)*F{j) 

3 = 1 

with j the phenotype it is paired with, M the number of 
different phenotypes (3), P{ij) the payoff of phenotype i 
against j and F{j) the proportion of phenotype j in the pop¬ 
ulation. From this fitness, we can deduce the variation of 
phenotypes repartition by updating the proportion F of each 
phenotype i: 


We show in Figure 4(a) a vector field of this gradient. We 
can see that there actually exists an equilibrium between the 
three phenotypes (marked by the a dot at the crossing be¬ 
tween the the dotted lines). This implies that even though 
the turner strategy is not the more efficient one, it is still ex¬ 
pected that this phenotype can invade and coexist with the 
two other phenotypes. 

We can hypothesize that we could not observe this equi¬ 
librium in our robotic simulations because of the stochastic 


effects arising from selection in a finite population. In or¬ 
der to study this hypothesis we ran additional computational 
simulations based on the same payoff matrix. The initial 
population is entirely composed of leaders and the selection 
method is an elitist (y + y) evolution strategy where N is 
the population size. Every 10 generations, each offspring 
has a probability of 1 * 10“ 2 to mutate into any of the two 
other phenotypes. 

Figure 4(b) shows the final repartition of phenotypes af¬ 
ter 1500 generations of evolution for N = 20, N = 100 
and N = 1000 in 11 independent replicates. We can see 
that when increasing population size we also increase the 
probability that an equilibrium where the three phenotypes 
exist is reached. We actually observe that the repartition of 
phenotypes at the last generation of evolution gets closer to 
the predicted equilibrium as population size increases. This 
implies that when population size increases, the probabil¬ 
ity to lose particular phenotypes decreases. In other words, 
the effect that the stochasticity of fitness evaluation has on 
the sampling of the genotypes for the next generation is mit¬ 
igated: population size is essential to the maintenance of 
specialists. 

General Properties for Evolving 
Heterogeneous Behaviours 

From the previous Section, we can hypothesize two key 
properties for the successful evolution of genotypic poly¬ 
morphism. First, we showed that population size needed to 
be large enough in order to decrease the probability that het¬ 
erogeneity could be lost during the evolutionary time. Even 
under an elitist selection where the best individuals are im¬ 
mediately selected, the stochastic nature of fitness evaluation 
entails that there is no guarantee that both types get selected. 
This means that a performance biased selection may lead to 
the composition of the new population not accurately repre¬ 
senting the genotypic diversity of the previous one. There¬ 
fore, there needs to be a mechanism for the preservation of 
genotypic diversity. Second, we previously saw that one key 
reason for the invasion of turner individuals is that, while 
followers perform badly against themselves, this is not the 
case for the formers. This means that the manner in which 
robots are paired is essential for achieving specialisation. 

In order to test these hypotheses we design a last exper¬ 
iment where we diverge from the initial problem and now 
coevolve two separate populations. In this coevolution algo¬ 
rithm, each individual of one population is always evaluated 
against an individual of the other population (5 times as in 
previous experiments). Then, each population separately un¬ 
dergoes selection under an elitist (10 +10) selection method 
to create the population of the next generation (which means 
that each population size is 20). We conducted 11 indepen¬ 
dent replicates which lasted 90000 evaluations each. The 
populations were initially constituted of random individuals. 

We show (Table 6) that when using coevolution, we al- 
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Leader 


Leader 




(a) (b) 

Figure 4: Vector field of the gradient of phenotypes’ proportions and proportions of phenotypes at the last generation 
of evolution, (a) Vector field of the gradient of phenotypes’ proportions in an infinite population. The strength of variation is 
indicated by the color of the arrow, (b) Repartition of phenotypes at the last generation of evolution for all three population 
sizes. Evolution lasted 1500 generations and results were replicated across 11 independent simulations. The initial population 
was entirely composed of leaders. 


# L/F 

# Turning 

# NC 

Total 

Strat. 

Strat. 

Strat. 


11 

0 

0 

11 


Table 6: Strategies evolved by the best individuals when 
coevolving two populations. Repartition of the different 
strategies adopted bt the best individuals at the last eval¬ 
uation in each of the 11 replicates. We indicate in each 
cell the number of simulations where a particular strategy 
evolved. Two populations were coevolved under elitist se¬ 
lection and the individuals’ genotype values were initially 
random. In the table ”L/F” stands for leader/follower and 
”NC” for ’’Non-Cooperative”. 

ways evolve specialists in every replicates. Moreover, this 
algorithm is highly stable as the heterogeneous behaviours 
that emerged were never lost during evolution in every repli¬ 
cates. This means that coevolution is highly efficient both 
for the bootstrap of a leader/follower strategy and its main¬ 
tenance throughout evolution. Regarding our hypothesized 
properties, we can check that the coevolution algorithm re¬ 
spects both of them. Firstly, as populations are separately 
coevolved, we make sure that performance-based selection 
does not accidentally lead to the disappearance of special¬ 
ists. Thus we ensure that the populations’ genotypic diver¬ 
sity is protected. Secondly, we create a very specific pair¬ 
ing between individuals. Indeed individuals inside the same 


population are never partnered with one another. This means 
that followers are always paired with leaders. As turners 
thus possesses no fitness benefit over the other phenotypes, 
their invasion is prevented. The question is open as to how 
to endow an algorithm working on a single population with 
such properties. 

Discussion and Conclusions 

In this paper, we investigated the evolution of specialisation 
through a leader/follower strategy in a cooperative foraging 
task. Our goal was to reveal the difficulties that arise when 
trying to evolve genotypic polymorphism in a single popula¬ 
tion. To that end, we mainly studied the dynamics of evolu¬ 
tion with two different selection methods: an (/i + A) elitist 
evolution strategy and fitness-proportionate selection. 

We first showed that the long term evolution of a 
leader/follower strategy was nearly impossible with an eli¬ 
tist selection. However bootstrapping specialists was not 
a problem as we observed that they frequently emerged 
during evolution. The major obstacle was rather to main¬ 
tain heterogeneity over evolutionary time. Indeed, even 
when adding efficient followers to a population of leaders 
to force the adoption of a leader/follower strategy, special¬ 
ists couldn’t be maintained. In comparison, the properties 
shown by the fitness-proportionate algorithm were quite the 
opposite. While it was almost not capable of evolving a 
leader/follower strategy (nor any other cooperative strategy), 
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the fitness-proportionate selection demonstrated high stabil¬ 
ity. It was therefore capable of maintaining specialists when 
present. We thus revealed two critical properties for evolv¬ 
ing heterogeneous behaviours in a single population: boot¬ 
strapping these behaviours and maintaining them through¬ 
out evolution. 

We then ran computational analyses and showed that 
while a pair of turners is indeed less efficient w.r.t. payoff 
than a pair of leader and follower, it is a lot more efficient 
than a pair of leaders or a pair of followers. As a result, these 
individuals can easily invade part of the population. More- 
oever, we also showed that the maintenance of specialists 
was very sensible to population size. Performance-based 
selection can indeed affect heterogeneity in the composi¬ 
tion of the next generation’s population. Finally, a coevo¬ 
lution algorithm, which we showed to be always successful 
in evolving heterogeneous behaviours, solved both of these 
two problems with (1) specific partners selection as pairs 
were constituted of individuals from different populations 
and (2) protection of the behaviours evolved by applying se¬ 
lection separately on the two populations. While this algo¬ 
rithm is not concerned with genotypic polymorphism in a 
single population, it is useful to yield effective mechanisms 
which could be studied to solve our problem. 

This raises several interesting perspectives on how to 
solve this problem. First, niche protection could prevent the 
disappearance of the efficient but unstable leader/follower 
strategy. As a matter of fact, coevolution is akin to a partic¬ 
ular type of niches protection with 2 niches. However, we 
intend to investigate how we could implement such mech¬ 
anism without specifying the explicit number nor the orga¬ 
nization of the niches. Rewarding diversity (Lehman and 
Stanley, 2008) is also known as an effective way to protect 
novel behaviours and could be another promising direction. 
In particular, a multiobjective algorithm on performance and 
diversity (Doncieux and Mouret, 2014), by rewarding geno¬ 
typic and phenotypic diversity, may protect evolved special¬ 
ists. 

Secondly, we showed that because partners were chosen 
randomly among the population, it created the opportunity 
for a ’’parasitic” strategy to invade. An interesting direction 
for future works could be to investigate restrictions in the 
choice of partners. For example it would be compelling to 
investigate how the individuals could evolve to select their 
partner based on genotypic or phenotypic information. 
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Abstract 

In this paper we show how evolving robots can develop be¬ 
haviors displaying a modular organization characterized by 
semi-discrete and semi-dissociable sub-behavioral units play¬ 
ing different functions. In our experiments, the development 
of differentiated behaviors is not realized through the sub¬ 
division of the control system into modules and/or through 
the utilization of differentiated training processes. Instead, 
it simply originates as a consequence of the adaptive advan¬ 
tage provided by the possibility to display and use function¬ 
ally specialized behaviors. These are selected by evolution 
not only with respect to their capability to perform a given 
sub-function but also with respect to the capability to support 
smooth and effective transition with other behaviors. This 
is achieved by having different co-adapted behaviors and by 
evaluating the variation affecting the behaviors on the basis 
of the impact they have on the overall performance of the 
robots. Moreover this process enables the development of the 
ability to carry on preparatory actions that are necessary for 
the effective execution of the following behaviors. We refer 
to this type of modularity as functional modularity, since un¬ 
like structural modularity, it is not based on behavioral mod¬ 
ules that are separated by clear boundaries and/or that are pro¬ 
grammed or trained independently. 

Introduction 

The acquisition of new behavioral skills and the ability to 
progressively expand the behavioral repertoire represents 
one key aspect of natural intelligence and a fundamental 
capability of robots that operate in dynamic and uncertain 
environments. One way to achieve this objective consists 
in using a structural modular approach in which different 
layers or modules of the robots controller are responsible 
for the production of different corresponding behaviors and 
in which the behavioral repertoire of the robots can be ex¬ 
panded by adding new layers or modules in an incremental 
fashion. Indeed, the discovery and utilization of control ar¬ 
chitecture of this type (Brooks, 1986; Arkin, 1998) enabled 
the achievement of tremendous progress in robotics. 

In structural modular architectures of this type each mod¬ 
ule has the following characteristics: it is responsible for the 
production of a specific behavior, it is separated from the 
other modules by clear boundaries, and it is programmed 


or trained independently. These characteristics present ad¬ 
vantages but also drawbacks that can outnumber the advan¬ 
tages, especially when there is a significant interdependence 
between the different behaviors. On the one hand, the fact 
that modules are separated by clear boundaries enables the 
utilization of a divide and conquer strategy which enables to 
divide the overall design problem into a set of partially in¬ 
dependent simpler problems. Moreover, the separation and 
independence among modules potentially provide a straight¬ 
forward solution for the realization of a progressive expan¬ 
sion of the behavior repertoire that can be realized through 
the progressive addition of new modules. On the other hand, 
it also inevitably leads to solutions in which the importance 
of the interdependence between the different behaviors is ne¬ 
glected. Furthermore, the rigid separation between the mod¬ 
ules prevents the exploitation of solutions that require the 
introduction of minor modifications on previous developed 
modules/behaviors that might be crucial for the possibility to 
re-use previous capabilities for realization of new additional 
skills. 

In that respect it is important to point out that the be¬ 
havior of natural organisms typically displays a modular 
organization characterized by somewhat semi-discrete and 
semi-dissociable subunits, or sub-behaviors, playing dif¬ 
ferent functions or sub-functions (West-Eberhard, 2003). 
These sub-behaviors are not completely separated, dissocia¬ 
ble, and independent. The modular organization of behavior 
in natural organisms therefore is characterized by both dis¬ 
creteness and the possible presence of boundaries between 
sub-behaviors and by connectedness and integration among 
them (West-Eberhard, 2003). Moreover, it is important to 
consider that the effective execution of a behavior perform¬ 
ing a given function often requires the execution of prepara¬ 
tory actions. For example, the effective execution of a grasp¬ 
ing behavior requires the execution of preparatory actions 
that modify appropriately the posture of the hand already 
during the execution of the reaching behavior that precedes 
the grasping activity (von Hofsten and Ronnqvist, 1988). 

In this paper we show how evolving robots can develop 
behaviors displaying a modular organization characterized 
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by semi-discrete and semi-dissociable sub-behavioral units 
playing different functions. In our experiments the devel¬ 
opment of differentiated behaviors is not realized through 
the subdivision of the control system into modules and/or 
through the utilization of differentiated training processes. 
Instead, it simply originates as a consequence of the adap¬ 
tive advantage provided by the possibility to display and use 
functionally specialized behaviors. These are selected by 
evolution not only with respect to their capability to perform 
a given sub-function but also with respect to the capability 
to support smooth and effective transition with other behav¬ 
iors. This is achieved by having different co-adapted behav¬ 
iors and by evaluating the variation affecting the behaviors 
on the basis of the impact they have on the overall perfor¬ 
mance of the robots. Moreover this process enables the de¬ 
velopment of the ability to carry on preparatory actions that 
are necessary for the effective execution of the following be¬ 
haviors. We refer to this type of modularity as functional 
modularity since, as in the case of structural modularity, it 
is characterized by the presence of differentiated behaviors 
achieving specialized functions but, differently from struc¬ 
tural modularity, is not based on behavioral modules which 
are separated by clear boundaries and that are programmed 
or trained independently. 

In the context of structural modular approaches, the pos¬ 
sibility to realize smooth transitions between behaviors also 
depends on the arbitration mechanism utilized. In the case of 
competitive arbitration mechanisms, in which the transition 
between behaviors is achieved by suddenly shifting the con¬ 
trol of the robot actuators from one module to another, the 
transitions tend to be abrupt. Instead, in cooperative arbi¬ 
tration mechanisms, in which multiple modules can concur¬ 
rently control the robot actuator and in which the arbitration 
is realized by gradually changing the relative weight of the 
different modules (Arkin, 1998), the transitions between be¬ 
haviors tend to be smoother. However, the type of behavior 
produced during the transition phase on the basis of the latter 
approach, is necessarily constituted by a weighted average 
of the behaviors that is produced on the basis of the single 
modules. This type of average behavior is not necessarily 
effective. Moreover, in a transition between two behaviors, 
this method does not provide a way to realize preparatory 
actions, i.e. actions that do not belong neither to the first nor 
to the second behavior but that represent a pre-requisite for 
the appropriate execution of the second behavior. 

Our work is related to previous evolutionary robotics stud¬ 
ies that have addressed the evolution of multiple behaviors 
(Izquierdo and Bhrmann, 2008; Seth, 2011; Schrum and Mi- 
ikkulainen, 2012; Petrosino et al., 2013; Williams and Beer, 
2013). In these experiments, however, the synthesis and the 
exhibition of multiple behaviors represented the only possi¬ 
ble viable solution since the evolving robots were required 
to carry on mutually exclusive tasks (e.g. eating or avoid eat¬ 
ing a specific food type (Seth, 2011; Petrosino et al., 2013) 


or moving on the basis of a wheeled or legged actuators 
(Williams and Beer, 2013). In another related work Rahim 
et al. (2014) evolved neural network controllers that received 
as input the output produced by a set of pre-programmed 
modular controllers. Thus to the best of our knowledge, 
no previous studies focused on whether behavior differen¬ 
tiation and functional modularity can be observed on robots 
evolved for the ability to perform a single task. 

The Method 

To study this issue we decided to consider a cleaning ex¬ 
perimental scenario in which a wheeled robot needs to vac¬ 
uum clean the floor of an unknown in-door environment. 
We choose this problem since it represents the first (and 
still the most significant) successful application domain of 
autonomous robot solutions (Roomba, the first autonomous 
vacuum-cleaning robot developed by iRobots® under the 
supervision of Rodney Brooks and commercialized from 
2002 has been sold in more than 10 million units to date, see 
IRobot (2013). Rather than designing the controller by hand, 
we studied whether effective controllers can be developed 
from scratch through an evolutionary method in which the 
evolving robots are selected on the basis of the percentage 
of successfully cleaned surface, i.e. on the basis of a scalar 
value that rates their overall ability to perform the task. 

It is important to point out that we chose this domain also 
because it involves the execution of a task with a single goal 
(cleaning the environment) that does not necessarily require 
modular solutions. This enables us to study whether and 
how functionally modular robots evolve, whether and why 
behavior differentiation and functional modularity provide 
an advantage with respect to non-modular solutions, and 
eventually which are the characteristics and functions of the 
evolved sub-behaviors. In fact, domains involving multiple 
conflicting goals, such as those used in the literature address¬ 
ing the study of action selection cited above, necessarily re¬ 
quire the development of solutions characterized by multi¬ 
ple behaviors and implicitly constrain the number and type 
of required sub-behaviors. 

The investigation of the cleaning problem also permits 
to compare our evolved solutions with those developed by 
companies that sell cleaning robots. In that respect, the fact 
that the behavioral policies displayed by different versions 
of the Roomba and by similar robots produced by other com¬ 
panies significantly differ (Palleja et al., 2010) demonstrates 
that finding the optimal solution/s of this problem is far from 
trivial. 

The Task, the Environment and the Robot 

To evolve robots that are robust with respect to environmen¬ 
tal variations we evaluated each robot for 3 trials/cleaning 
sessions. At the beginning of each trial, the initial posi¬ 
tion and orientation of the robot in the environment, and 
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the specific characteristics of the environment, like wall di¬ 
mensions, in which it was situated in were randomly varied 
within limits. 

Each trial lasted 6 minutes and 15 seconds. This repre¬ 
sents a rather short period of time, although performing a 
precise comparison with the time required by commercial 
robots to clean completely or almost completely a surface 
with similar properties is impossible due to the lack of sim¬ 
ilar data (for some indications see Palleja et al. (2010)). 

To compute the cleaning performance we calculated the 
percentage of 20x20cm non-overlapping areas visited by the 
robot at least once during a trial. We used a concave envi¬ 
ronment (Figure 1) constituted by a large central area and 
by four peripheral corridors that represent a room-like envi¬ 
ronment. The average environment had a central area with a 
size of 6.8m and four corridors with a size of 3.78m in total. 
The exact size of the environment however was randomly set 
at the beginning of each trial. This was realized by varying 
the height and width of the central area and of corridors of 
±33% and ±18%, respectively, during different trials. The 



robot used was a MarXbot (Bonani et al., 2010), a differen¬ 
tial drive wheeled robot with a diameter of 17cm. The robot 
is equipped with 24 infrared sensors evenly distributed along 
the robot’s body and capable of detecting objects in a range 
of 10cm. Moreover, it was equipped with a rotating laser 
sensor capable of detecting obstacles at longer distance. Ex¬ 
periments were run in simulation by using the FARS A open- 
software tool (Massera et al., 2013) that includes an accurate 
simulator of the robot and of the environment. 

The robots’ neural controller 

The robots are provided with a feedforward neural net¬ 
work controller without recurrence. In both experiments the 
robots are equipped with eight sensory neurons that encode 
the average activation state of eight groups of three adjacent 


infrared sensors each and two motor neurons that encode the 
desired speed of the two robots wheels. The sensory neu¬ 
rons are fully connected with the motor neurons and to hid¬ 
den neurons (if present), and the hidden neurons are fully 
connected to the motor neurons. Hidden and motors neu¬ 
rons are provided with biases. The state of the hidden and 
motor neurons is computed on the basis of the logistic func¬ 
tion. The state of the sensory neurons, and the desired speed 
of the robots wheels are updated every 50ms. Experiments 
have been replicated in the following two experimental con¬ 
ditions: 

(S) Simple: The robots are only provided with the in¬ 
frared sensors. 

(T) Time: The robots are provided with an additional 
sensory neuron that encodes the time passed since the be¬ 
ginning of the current cleaning session (trial), i.e. whose 
activation state linearly varies between 1.0 and 0.0 during 
the course of the trial. This sensor has been added to en¬ 
able the robot to vary its behavior during the course of a 
cleaning session. Notice that this sensor enables the robot to 
access information extracted from the robot’s internal envi¬ 
ronment (e.g. a robot clock situated inside the robot body) 
while the other sensors enable the robot to access informa¬ 
tion extracted from the external environment. 

The connection weights and biases, that determine the 
robots behavior, are initially set randomly and evolved as 
described in the section below. 

The evolutionary algorithm 

The initial population consists of 20 randomly generated 
genotypes, which encode the connection weights and biases 
of 20 corresponding individual robots (each parameter is en¬ 
coded by 8 bits and normalized in the range [-5.0, +5.0]). 
Every generation, each individual is evaluated for three trials 
in environments that randomly varied in dimension within 
the limits indicated above. The fitness of each trial is cal¬ 
culated by counting the percentage of 20x20cm portions of 
the environment that are visited by the robot at least once 
during the trial. The total fitness is calculated by averaging 
the fitness obtained during the three trials. All individuals 
are allowed to generate an offspring that is also evaluated 
for three trials. The 20 offspring are generated by creat¬ 
ing a copy of the parent genotype and by mutating each bit 
with a 2% probability. The offspring genotype is used to 
replace the genotype of the worst parents or discarded de¬ 
pending on whether or not offspring outperform some of the 
parents. The genotypes of the initial population were gener¬ 
ated randomly. Each evolutionary experiment was replicated 
20 times starting from different randomly generated initial 
populations. 

Results 

In this section we first describe the performance achieved in 
the different experimental conditions. As we will see, the 


162 







cleaning task in this concave environment requires the exhi¬ 
bition of at least two sub-behaviors that differ in forms and 
functions: an exploration behavior that enables the robot to 
explore the large central area and a wall-following behavior 
that enables the robot to explore the peripheral areas and the 
borders of the central area. The possibility to discover and 
to display these two behaviors rather than a single undiffer¬ 
entiated behavior crucially depends on the characteristics of 
the robots neural controller as demonstrated by the fact that 
the behavior and the performance significantly vary in the 
two experimental conditions. 

Then we will discuss the mechanisms that support be¬ 
havioral differentiation and arbitration by analysing the be¬ 
havioral solutions found in the different experimental condi¬ 
tions. As we will see, the two most important mechanisms 
that support the evolution of multiple behaviors are the abil¬ 
ity to perceive and to generate affordances (i.e. opportunities 
for behaviors) and the possibility to flexibly and properly 
handle behavioral transitions. 

Performance and efficacy of modular versus 
non-modular solutions 

By post-evaluating the best robot of the last generation of 
each replication for 500 trials we can osee how the evolved 
robots reach close to optimal performance in the Temporal 
(T) experimental conditions and relatively low performance 
in the case of the simple (S) condition (Figure 2, top). The 
performance of each experimental condition statistically dif¬ 
fers from each other (Mann-Whitney U, p<0.05). The per¬ 
formance obtained in the experiments in which the robots 
were also provided with the internal neurons (Figure 2, bot¬ 
tom) does not significantly differ from the experiments with¬ 
out internal neurons (Mann-Whitney U, p>0.05). 

The analysis of the behaviors displayed by the best robots 
of the last generation indicates that the performance level 
correlates with the ability of the robots to display multiple 
behaviors. This is clearly illustrated by the behavior dis¬ 
played by the best (S) and (T) robots that achieved a fit¬ 
ness of 67.4% and 82.8%, respectively. While (S) displays 
a single uniform behavior along the trial (figure 3, top), (T) 
is capable of performing two well-differentiated behaviors 
(Figure 3, bottom). 

Indeed, the best robot with a simple architecture (S) al¬ 
ways behaves in the same manner during the successive 
phases of the trial (Figure 3, top-left). In particular it avoids 
walls and obstacles by sharply turning with an angle of 45- 
90 degrees (depending on the relative angle with which the 
robot approaches the obstacle) and moves straight when it is 
far from obstacles. Through the exhibition of this behavior 
the robot spends most of the time exploring the large central 
portion of the environment and only occasionally it explores 
the peripheral corridors when it happens to approach them 
with a direction that is almost orthogonal to the entrance of 
the corridor. The robots of the other replications of the ex¬ 


periments show qualitatively similar behaviors (results not 
shown). The best robot with the time neuron architecture 
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Figure 2: Boxplots of performance in the cleaning task. The 
top and bottom figures report the results obtained without 
internal neurons and with internal neurons, respectively. The 
boxplots display the performance of the best robot of the last 
generation in the two experimental conditions, i.e. in the 
simple (S) and temporal (T) conditions. Each box displays 
the performance of the best robot of 20 replications of each 
experiment. The performance is indicated by the percentage 
of cleaned cells within the walls. The value corresponding 
to optimal performance is unknown but is reasonably below 
1.0 given that the robots have a rather limited cleaning time. 


(T), instead, shows two well differentiated behaviors: (i) an 
initial exploration behavior that is realized by producing a 
progressively larger curvilinear trajectory that enables the 
robot to explore the large central portion of the environment, 
and (ii) a wall-following behavior that enables it to explore 
all the peripheral areas of the environment (Figure 3, top- 
right). Although the way in which the exploration behavior 
is realized varies in different replications of the experiment, 
well-differentiated exploration and wall-following behaviors 
are clearly observable in all cases (results not shown). The 
high performance of these robots is due to their ability to 
display different behaviors, which are specialized for the ex¬ 
ploration of large open areas and peripheral areas, and to 
carefully tune the time duration of the two behaviors. In¬ 
deed, the relative duration of the two behaviors determines 
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whether the robot spends enough time exploring the central 
large area while keeping enough time to explore all the pe¬ 
ripheral areas of the environment. A qualitative analysis of 
the first 10 replications showed that in the best two robots, 
that clearly outperform the best robots of the other 8 repli¬ 
cations, the transition occurs at 3.17±0.11min. This transi¬ 
tion time is optimal or nearly optimal as demonstrated by the 
fact that post-evaluation tests performed by slowing down or 
speeding up the robots internal clock and consequently the 
behavior transition led to significantly worse performance 
(results not shown). 




Figure 3: Typical trajectories displayed by the best robots of 
the two experimental conditions without hidden units. The 
portions of the trajectory produced during the first, second, 
and third part of the trial (i.e. from step 1 to 2500, from step 
2501 to 5000, and from step 5001 to 7500, respectively) are 
shown with different colours and line style. 


On the mechanisms supporting behavior 
differentiation and arbitration 

We have seen how controllers presenting the ability to dis¬ 
play multiple behaviors, can enable the adaptive robots to 
achieve better performance and that the emergence of this 
ability depends on the characteristics of robots neural con¬ 
trollers. We will now focus on the mechanisms supporting 
behavior differentiation and arbitration. 

Before entering into this, it is important to point out that 
the behavior displayed by an embodied and situated agent is 


a dynamical process unfolding in time that results from the 
robot/environmental interactions. This implies that the or¬ 
ganization of behavior/s vary at different time scales. More¬ 
over, this implies that the sensory states experienced by the 
robot at a given time step are co-determined by the actions 
produced by the robot during previous robot/environmental 
interactions. If we use the term affordance introduced by 
Gibson (1979) to indicate sensory states that elicit the pro¬ 
duction of behaviors, this implies that the affordances are 
not only extracted through sensors from the internal and/or 
the external environment but are also generated by the robot 
itself through actions. 

The analysis of the behavior exhibited by the robots at a 
short time scale (i.e. at a time scale of seconds) indicates that 
in all experimental conditions robots tend to exhibit at least 
two different low-level behaviors: (i) an obstacle-avoidance 
behavior that consists in turning while the robot detects an 
obstacle on its frontal side, and (ii) a move-forward behavior 
that consists in moving straight or almost straight while the 
robot does not detect obstacles in its frontal side. This im¬ 
plies that at a short time scale all robots of all experimental 
conditions displayed a certain kind of functional modular¬ 
ity. The reasons that explain why this type of modularity 
always evolve are that it plays a fundamental role (i.e. it en¬ 
ables the robot to avoid being stuck and to keep exploring 
the environment) and that it is supported by the availabil¬ 
ity of always available and easy to use affordances. Indeed, 
independently from the way in which the robot behaves, it 
will always experience a lack of activation on the frontal in¬ 
frared sensors when the robot/environment context affords a 
move-forward behavior and an activation on the frontal in¬ 
frared sensors when the robot/environmental context affords 
an obstacle-avoidance behavior. The infrared sensors there¬ 
fore always enable the robot to perceive when the former or 
the latter behavior should be produced and when the transi¬ 
tion between the two behaviors should occur. 

This ideal situation, however, in which the robot can rely 
on robust and ready-to-use affordance states only charac¬ 
terize few lucky cases (incidentally, this probably explains 
why the combination of obstacle-avoidance and navigation 
behaviors represents a widely used experimental scenario in 
robotics). In other cases, the affordance states supporting 
behavior differentiation and arbitration should be extracted 
through internal elaboration and/or generated through the 
exhibition of appropriate behaviors. 

As we have seen in the previous section, the studied clean¬ 
ing task requires behavioral diversification also at a higher 
time scale, e.g. it requires the exhibition of an exploration 
and a wall-following behavior lasting for minutes. In this 
case however, the robot cannot rely on ready-to-use affor¬ 
dances that indicate when the robot should display the first 
or the second behavior and when the robot should switch 
from one behavior to the other. To achieve this kind of mod¬ 
ularity the evolving robots should find a way to: (i) keep 
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producing the same behavior for a prolonged period of time, 
(ii) switch behavior at the right moment, and (iii) realize a 
suitable transition during behavior switch. We will illustrate 
in details how the evolved robots manage to master these 
requirements in the next three sub-sections. 

Notice that the evolution of context-dependent behaviors 
require the concurrent development of two interdependent 
skills, the ability to produce a new behavior and the ability 
to regulate appropriately when the new behavior should be 
exhibited (West-Eberhard, 2003). 

Producing behaviors for prolonged periods of time 

All evolved robots solve the problem of producing a given 
behavior for a prolonged period of time by realizing each 
behavior in a way that ensures that they keep experiencing 
stimuli of the right type during the execution of that behav¬ 
ior. In cases in which the robots should exhibit two differ¬ 
entiated behaviors, i.e. an exploration and a wall following 
behavior, this implies that they should realize the former and 
the latter behaviors in a way that ensures that they keep ex¬ 
periencing stimuli of type 1 and 2 while they exhibit the for¬ 
mer or the latter behavior, respectively, and should react to 
the stimuli of the two types by producing actions that enable 
them to keep producing the former or the latter behaviors, re¬ 
spectively. The two classes of stimuli thus assume the role of 
affordance for the first and for the second behaviors, respec¬ 
tively. These affordances are not directly available from the 
environment, as in the case of the states affording the obsta¬ 
cle avoidance and move-forward behavior discussed above, 
but are generated by the robots themselves through their ac¬ 
tions (i.e. through the ability to realize each behavior in a 
way that ensure that the robot keeps experiencing the cor¬ 
responding affordances). This form of dynamical stability 
presents some similarities with the one that can be obtained 
in situated agents through homeokinesis (Der and Martius, 
2012), a task-independent learning process that can enable 
situated robot to synthesize temporarily stable behaviors, de¬ 
spite the mechanism and the processes through which this is 
realized are completely different. 

All (S) and (T) robots exploit this affordance generation 
mechanism. However, in the case of the (T) robots, they also 
exploit an additional mechanism that contributes to enable 
the robots to keep producing each behavior for a prolonged 
period of time. 

The problem of producing the same behavior for a pro¬ 
longed period of time is also solved by exploiting the cue 
provided by the state of the temporal neuron. Indeed, 
whether the robot keeps producing the exploration behav¬ 
ior or switches to the wall-following behavior also depends 
on the state of the temporal neuron (see Figure 4). The state 
of the time neuron influences the duration of the exploration 
behavior only during a critical phase, i.e. when the state 
of the time neuron is smaller than 0.6 and greater than 0.4. 
During the rest of the trial the ability of the robot to keep 


producing the exploration behavior or the wall-following 
behavior rely on the affordance generation mechanism de¬ 
scribed above. Interestingly, in the case of the best (T) robot, 
the temporal neuron is also used to progressively vary over 
time the way in which the exploration behavior is realized 
so to regulate the probability that the robot keep experienc¬ 
ing sensory state affording the execution of the exploration 
or wall-following behaviors. Indeed, by initially moving for¬ 
ward and turning left of several degrees, the robot eliminates 
completely the possibility to encounter a wall on its left side 
(i.e. the possibility to experience stimuli affording the alter¬ 
native wall-following behavior). Then, by moving forward 
and progressively reducing the angle of turn over time, the 
robot becomes progressively less adverse with respect to the 
possibility to experiencing stimuli affording the wall follow¬ 
ing behavior. This brings us to the question of how robots 
manage to switch behavior. 



Figure 4: Behavior produced by the best (T) robot during 
different trials in which it started from the same initial po¬ 
sition with systematically varied orientations and systemati¬ 
cally varied state of the time neuron. The red and blue lines 
represent the trajectories produced by the robot during trials 
in which it switches or does not switch to the wall-following 
behavior, respectively. The black lines represent the walls. 
For sake of clarity we only show the local portion of the en¬ 
vironment in which the robot is located. 

Switching between alternative behaviors 

The problem of switching between different behaviors is 
also solved through affordance generation. To understand 
how robots can act in a way that enable them to both ex¬ 
perience stimuli affording the current behavior and stim¬ 
uli affording the alternative behavior, we should reformu¬ 
late the definition of affordance generation in probabilistic 
terms. Evolved robots solve the problem of producing a 
given behavior for a prolonged period of time and the prob¬ 
lem of switching behavior by realizing each behavior in a 
way that ensures that they keep experiencing stimuli afford¬ 
ing the current behavior with a given high probability and 
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stimuli affording the alternative behavior with a given low 
probability, respectively. 

In the case of the robot evolved in the (T) experimental 
condition, the switch is regulated by both the stimuli ex¬ 
perienced by the robot (i.e. by affordance generation) and 
by the cue provided by the robots internal clock. This dou¬ 
ble regulation enables the best (T) robot to carefully balance 
the time allocated to the two types of behavior and to re¬ 
duce the variability among trials (i.e. the transition occurs 
3.17±0.11min). The double regulation process is demon¬ 
strated by the analysis of the trajectories produced during 
a series of trials in which the robot always start from the 
same position and the orientation of the robot and the state 
of the time neuron are systematically varied. As shown in 
the Figure 4, whether the robot switches the wall-following 
behavior depends both on the state of the internal clock and 
on the state of the infrared sensor when the robot approaches 
the wall. Overall this shows that whether the switch between 
the two behaviors occurs or not depends both on the state of 
the internal clock and on the way in which the exploration 
behavior is realized which, in turn, influences the type of 
stimuli that the robot experiences. As mentioned above, in 
the case of the best (T) robot, the state of the time neuron 
is not only used to regulate the probability that the robot 
switches behavior directly (the probability that the robot ini¬ 
tiate a wall following behavior in a given relative position 
in the environment) but is also used to regulate the way in 
which the exploration behavior is realized which in turn in¬ 
fluences the probability that the robot will later experience 
stimuli affording the wall-following behavior. 

Realize suitable transitions during behavior switch 

The connectedness of behaviors, i.e. the fact that alternative 
behaviors are semi-discrete and semi-dissociable units that 
are only partially independent, implies that the transitions 
between behaviors should be handled with care. In the case 
of our experiments, in particular, the transition between the 
exploration and the wall-following behavior requires special 
care since the latter behavior can only be produced when 
the robot is located near a wall and when the wall to be fol¬ 
lowed is located on a specific side of the robot. Indeed, the 
analysis of the evolved robots shows that the way in which 
the behavior transitions are handled in evolved robots has an 
important impact on robots performance. 

The best solution to the transition problem was discov¬ 
ered by the two best replications of the (T) robot (see Fig¬ 
ure 3, bottom). Indeed, as we mentioned above, this robot 
exploits the cue provided by the internal clock to gradu¬ 
ally modifying the exploration behavior so to ensure that 
the robot will always reach a relative location with respect 
to the walls from which the wall-following behavior can be 
effectively triggered during the critical period (i.e. during 
3.17±0.11 min). Overall this leads to an extremely timely, 
smooth and effective transition that enables these robots to 


outperform all others robots. 

The importance of realizing smooth transitions and the 
importance of executing preparatory actions can be appreci¬ 
ated by observing the cases in Figure 4, in which the values 
of the internal clock are set to 0.35, 0.25 and 0.15. In nat¬ 
ural conditions, when the internal clock assumes these val¬ 
ues, the robot always produces the wall-following behavior. 
The robot, however, is only able to initiate a wall-following 
behavior when it is located near a wall that is situated on 
its right side. If this prerequisite is not satisfied the wall¬ 
following behavior will not be exhibited. In normal condi¬ 
tions this problem never arise since the robots have evolved 
the ability to perform, during the execution of the explo¬ 
ration behavior, the preparatory actions that enable the suc¬ 
cessive execution of the wall-following behavior. 

Conclusions 

In this paper we showed how robots evolved for the ability 
to perform a cleaning task can develop functional modular 
solutions that involve the exhibition and the alternation of 
differentiated behaviors playing specialized functions (i.e. 
cleaning large open areas and cleaning narrow peripheral ar¬ 
eas, respectively). 

The development of differentiated behaviors, in our ex¬ 
periments, is not realized through the subdivision of the con¬ 
trol system into modules and/or through the utilization of 
differentiated training processes. Instead, it simply origi¬ 
nates as a consequence of the adaptive advantage provided 
by the possibility to display multiple differentiated behav¬ 
iors. Indeed, robots displaying multiple differentiated be¬ 
haviors achieved better performance with respect to robots 
displaying a single behavior. 

This approach provides a series of advantages with re¬ 
spect to structural modular approaches in which the synthe¬ 
sis of robots displaying multiple behaviors is realized by us¬ 
ing well-separated control modules that are responsible for 
the production of different corresponding behaviors and that 
eventually are designed and or trained independently. In par¬ 
ticular it releases the designer from the burden to identify the 
way in which the overall problem can be decomposed in sub¬ 
problems to be solved through the exhibition of specialized 
behaviors. More importantly, it enables the adapting robots 
to develop behaviors that are not only optimized with respect 
to the capability to accomplish the corresponding functions 
but that are also optimized with respect to their ability to 
operate effectively together. More specifically the adapted 
behaviors are realized in a way that ensure a smooth tran¬ 
sition between alternative sub-behaviors and in a way that 
ensure that the preparatory actions that are necessary to ini¬ 
tiate or to carry on a given behavior are realized before the 
robot initiates that behavior. 

The analysis of the obtained results indicates that the 
mechanisms that support the evolution of functionally mod¬ 
ular solutions are the ability to perceive affordances (i.e. per- 
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ceptual states encoding opportunities for behaviors) and the 
ability to realize smooth and effective transitions between 
different behaviors. 

The perception of affordance constitutes a prerequisite for 
the possibility to develop differentiated behavior and for the 
possibility to effectively arbitrate them, i.e. selecting the be¬ 
havior that is appropriate for the current robot/environmental 
context and regulating the duration of each behavior. Inter¬ 
estingly, the basic mechanism that is used by evolving robots 
to perceive affordances is affordance generation, i.e. the 
ability to realize each behavior in a way that ensures that the 
robot keeps experiencing sensory state affording the current 
behavior with a given high probability and sensory states af¬ 
fording alternative behaviors with a given low probability. 

The limitations of this affordance generation mechanism, 
e.g. the inability to finely tune the duration of behaviors, 
are overcome by using additional regulatory processes that 
rely on internal cues. In particular, in the case of the best 
evolved robot this is realized by complementing the basic 
affordance generation mechanism with two additional reg¬ 
ulatory processes. One of them consists in using the state 
of the internal clock to progressively vary the way in which 
the exploration behavior is realized so to progressively in¬ 
crease the probability that the robot will experience stimuli 
affording the wall-following behavior. The other additional 
regulatory process consists in using the state of the internal 
clock to vary qualitatively the way in which the robot re¬ 
acts in a specific environmental situation (e.g. to determine 
whether the robot avoid an obstacle by turning left or right 
that in turn determine whether the robot will keep producing 
the exploration behavior or will switch to the wall-following 
behavior). 

Overall this implies that behavior arbitration in the best 
evolved robots is realized through the combined effects of 
multiple partially redundant regulatory processes that oper¬ 
ate through weak interactions. 

Future studies should investigate whether this approach 
can enable evolving robots to find effective solutions in more 
complex tasks/scenarios. 
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Abstract 

In vitro constitution of biological functions is a useful 
strategy to understand the physicochemical principles 
underlying biological functions (Luisi and Stano, 2013). To 
date, various cellular functions have been artificially 
constituted, including self-replication of genetic information, 
a gene expression, and so on. As a self-replication system of 
genetic information, our group has reconstituted an RNA 
genome replication system coupled with translation 
(Ichihashi, et al. 2013). However, all living organisms have 
the DNA genomes, which replicate with DNA replicases 
translated from the genomes via mRNA transcription. A plan 
for the construction of such a DNA replication system 
coupled with transcription and translation has been proposed 
using rolling-circle-type replication scheme approximately 
10 years ago (Fig. 1, Forster and Church, 2006), although it 
has not been realized yet. In this study, we attempted to 
constitute the transcription-and translation-coupled DNA 
replication (TTcDR) system and perform an evolution 
experiment to make the system recursive. The aim of the 
present work is to show a method of the construction of a 
DNA genome replication system which leads to the in vitro 
construction of an artificial cell. 

We prepared a circular DNA encoding phi29 DNA 
polymerase gene under control of T7 promoter. The circular 
DNA was mixed with the reconstituted cell-free translation 
system derived from Escherichia coli (Shimizu, et al. 2001), 
T7 RNA polymerase and RNase inhibitor. To perform rolling- 
circle replication, we also added dNTPs, a random oligo DNA 
as a primer, and yeast pyrophosphatase in the mixture (Dean, 
et al. 2001). We first attempted to optimize the concentrations 
of several components (T7 RNA polymerase, RNase inhibitor, 
dNTPs, random DNA oligo, yeast pyrophosphatase, NTPs and 
tRNA). We found that the optimum concentrations of NTPs, 
tRNA, T7RNA polymerase and RNase inhibitor are in narrow 
ranges, indicating that these components are inhibitory to the 
TTcDR reaction at high concentrations. At the optimized 
concentrations of all components, the replication product 
DNA increased approximately 100-fold compared to original 
conditions. The kinetics of DNA replication showed a 
concave curve, suggesting that DNA replication accelerated 
due to the increasing phi29 DNA polymerase over time 


(Sakatani et al, 2015). This is consistent with the expected 
kinetics of the TTcDR reaction. 

A shortcoming of this TTcDR system is the lack of 
recursiveness. The initial template DNA is circular, whereas 
the product is a linear concatemer. To make this system 
recursive, the linear DNA product must be circularized. We 
next attempted to circularize the product DNA using Cre 
recombinase, which has been reported to circularize the linear 
DNA produced by phi29 DNA polymerase (Huovinen et al, 
2011), as proposed previously (Forster and Church, 2006). 
We added Cre recombinase into the optimized TTcDR system 
but found that Cre recombinase significantly inhibits the DNA 
replication catalyzed by phi29 DNA polymerase. This result 
indicates that Cre recombinase and phi29 polymerase does not 
work simultaneously. Therefore, to make this DNA 
replication system recursive, we have to add Cre recombinase 
after replication and remove it before the next round of 
replication. 


Circular 

DNA 
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Transcription ^ ^ 


phi29 DNA 
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Translation 




Linear 
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mRNA -O- -<>- mRNA 


Figure 1. Schema of the TTcDR system. This system 
consists of a circular DNA encoding phi29 DNA polymerase 
gene, T7 RNA polymerase, and a reconstituted translation 
system. T7 RNA polymerase transcribes mRNA from the 
circular DNA and phi29 DNA polymerase is translated. The 
polymerase initiates DNA polymerization to produce a long 
linear DNA. The replication product is further used as a 
template for transcription to produce phi29 DNA polymerase. 
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To establish a more automatic recursive replication, which 
would be suitable for an artificial cell, we next attempted to 
develop a Cre-resistant phi29 polymerase by using 
evolutionary engineering. We first established a cycle to 
repeat the TTcDR reaction in the presence of Cre recombinase 
(Fig. 2). This cycle consisted of four stages, (i) Encapsulation: 
the TTcDR system was encapsulated in a water-in-oil 
emulsion with Cre recombinase. A water droplet was 
expected to contain less than one DNA molecule, (ii) DNA 
replication: the emulsion was incubated for the TTcDR 
reaction, (iii) DNA recovery: the droplets were collected and 
the product linear DNA was amplified by PCR. (iv) 
Circularization: after converting the both ends of the DNA 
into sticky ends, the liner DNA was circularized by ligation. 
The new circular DNA was then re-encapsulated into the 
emulsion for the next round of the replication cycle. The 
product DNA concentration was measured using quantitative 
PCR. In this system, mutations are introduced into the DNA 
through polymerization error. If a mutant DNA that encodes 
Cre-resistant phi29 polymerase appears, it should dominate 
the population. We repeated this cycle for 40 times and found 
that the average replication ability of the DNA population 
increased gradually even in the presence of increasing amount 
of Cre recombinase. This indicated that more Cre-resistant 
mutant DNAs were obtained. 

In this study, we constructed the TTcDR system and 
obtained mutants of the DNA genome, the replication of 
which is tolerant to Cre recombinase. These results provide a 
step toward the in vitro construction of an artificial cell 
containing a recursive replication system of a DNA genome. 
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Figure 2. Cycle of the TTcDR for evolution experiment. 

(i) Encapsulation: the TTcDR system is encapsulated in a 
water-in-oil emulsion with Cre recombinase. (ii) DNA 
replication: incubation for the TTcDR reaction, (iii) DNA 
recovery: the droplets are collected and the product DNA was 
amplified, (iv) Circularization: the product linear DNA is 
circularized by ligation. 
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Abstract 

The critical mutation rate (CMR) determines the shift be¬ 
tween survival-of-the-fittest and the survival of individuals 
with greater mutational robustness (the “flattest”). Small pop¬ 
ulations are more likely to exceed the CMR and become less 
well adapted; understanding the CMR is crucial to under¬ 
standing the potential fate of small populations under threat 
of extinction. Here we present a simulation model capable of 
utilising input parameter values within a biologically relevant 
range. A previous study identified an exponential fall in CMR 
with decreasing population size, but the parameters and out¬ 
put were not directly relevant outside artificial systems. The 
first key contribution of this study is the identification of an 
inverse relationship between CMR and gene length when the 
gene length is comparable to that found in biological popu¬ 
lations. The exponential relationship is maintained, and the 
CMR is lowered to between two to five orders of magnitude 
above existing estimates of per base mutation rate for a vari¬ 
ety of organisms. The second key contribution of the study 
is the identification of an inverse relationship between CMR 
and the number of genes. Using a gene number in the range 
for Arabidopsis thaliana produces a CMR close to its known 
mutation rate; per base mutation rates for other organisms 
are also within one order of magnitude. This is the third key 
contribution of the study as it represents the first time such 
a simulation model has used input and produced output both 
within range for a given biological organism. This novel con¬ 
vergence of CMR model with biological reality is of partic¬ 
ular relevance to populations undergoing a bottleneck, under 
stress, and subsequent conservation strategy for populations 
on the brink of extinction. 

Introduction 

Fitter genotypes can be outcompeted by genotypes with 
greater robustness when the mutation rate exceeds a critical 
mutation rate (CMR); in terms of fitness landscapes, narrow 
high fitness peaks may be lost, while broader, lower peaks 
are maintained by a population of reproducing sequences. 
This so called “survival-of-the-flattest” has been observed 
in in silico evolving systems (Wilke, 2005). CMR has an 
exponential dependence on population size in both haploid 
(Channon et al., 2011) and diploid populations (Aston et al., 
2013); as population size falls, the CMR above which fitter 
alleles are lost transitions unexpectedly from near-constant 


(the previous assumption in evolutionary biology) to drop 
exponentially for small populations. It has been verified that 
this model closely reproduces the established mathemati¬ 
cal relationship between population size and “error thresh¬ 
old” (ET. No mathematical model has yet been derived for 
the CMR) (Aston et al., 2013). It is therefore possible that 
CMRs in small populations could be within the range of bi¬ 
ological mutation rates. However, biological organisms typ¬ 
ically have lengths and numbers of genes orders of magni¬ 
tude higher than those used in models of ETs or CMRs, so 
how relevant such models are to real biological populations 
remains an open question. 

From Artificial to Biological Evolution: 
Mutation of Genes in Nature 

To bridge the gap between artificial and biological evolution 
it is paramount that, when implemented as a simulation, a 
model can be given parameter values within the range ob¬ 
served in biological organisms and subsequently output bio¬ 
logically realistic results. The models defined in Aston et al. 
(2013) used arbitrary values for parameters such as sequence 
length, selected for their suitability to provide results within 
a small timeframe. 

Whitlock et al. (2003) performed computer simulations 
to investigate the effects of varying the strength of selec¬ 
tion and mutational effects among dimensions. They used 
a model based on Fisher’s model of the geometry of adap¬ 
tation (Fisher, 1930), but used a hyperellipse in which the 
strength of selection along any axis was drawn from an ex¬ 
ponential distribution. They concluded that changing from a 
hypersphere to a hyperellipse, and thus introducing dimen¬ 
sions with stronger selection than others, had a negligible 
effect on their results. It was therefore decided to focus on 
the parameters of mutation rate, gene length, and gene num¬ 
ber; assuming equal strength of selection is not expected to 
affect the credibility of the results. 

Mutation Rates 

The mutation rate used in the simulation model is analagous 
to the biological per base mutation rate (see Table 1). Bac- 
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Figure 1: Two-peak fitness landscape, with one narrow 
peak of high fitness (Peak 0), and one broader peak of lower 
fitness (Peak 1). The fitness score is relative, and the width 
and distance between the peaks are given in terms of Ham¬ 
ming distance. Diagram adapted from Wilke (2005). 

terial species were not included due to their use of lateral 
gene transfer which is not currently included in the simu¬ 
lation model. Viruses, which are known to live very close 
to the ET (Eigen and Schuster, 1979), were not included 
due to the complexity and variety of reproduction techniques 
which include incorporation into a host genome. Nachman 
and Crowell (2000) obtained an estimate of the average mu¬ 
tation rate per nucleotide by comparing pseudogenes (genes 
that do not code for proteins or are never expressed) in hu¬ 
mans and chimpanzees. Baer et al. (2007) brought together 
the results of theoretical and empirical studies to list mu¬ 
tation rate estimates in a number of multicellular eukary¬ 
otes. Drake et al. (1998) list mutation rate estimates from 
studies using mutation accumulation and radiation experi¬ 
ments. Lynch (2010a) also lists mutation rates from vari¬ 
ous sources. Xue et al. (2009) obtained an estimate for the 
base substitution rate in the human Y chromosome through 
direct sequencing. Kumar and Subramanian (2002) con¬ 
ducted a computational analysis of 5669 genes from species 
of placental mammals. Keightley et al. (2009) did whole- 
genome shotgun sequencing of three mutation accumula¬ 
tion lines of Drosophila melanogaster , while Keightley et al. 
(2014) sequenced two parents and 12 offspring. Denver et al. 
(2004) provide a direct estimate of the mutation rate from a 
set of Caenorhabditis elegans mutation accumulation lines. 
Haag-Liautard et al. (2008) and Ossowski et al. (2010) pro¬ 
vide estimates using mutation accumulation lines. Durbin 
et al. (2010) examine variation in the sequence of the human 
genome. Lynch et al. (2008) provide a mutation rate esti¬ 
mate from complete genome sequencing of Sacccharomyces 
cerevisiae. Lynch (2010b) used existing data to estimate the 
mutation rate of various eukaryotes. 

Genetic Sequences 

Derelle et al. (2006), Sharma et al. (2005) and Lewin (2008) 
list the length of various genes for various biological organ¬ 
isms at between approximately 1000 to 140,000 bp; the se¬ 
quence length of 30 bp used to produce the results in As¬ 


ton et al. (2013) is small when compared with the length of 
genes found in a wide range of natural species. 

Aston et al. (2013) used a two-peak landscape, with the 
height of peak 0 constant at 15 and the radius 2, the height of 
peak 1 constant at 10 and the radius 5, and the Hamming dis¬ 
tance between the peaks set at 10 (Figure 1). If each peak in 
the two-peak landscape is considered to be a different set of 
alleles (variant of a gene), estimates of genetic distances be¬ 
tween alleles for various genes can be seen to be analogous 
to the distance between the peaks. They fall within the range 
of 1 and 56 polymorphisms (Bryan et al., 2000; Ramkumar 
et al., 2010). Similarly, the number of polymorphisms was 
estimated to be at most 13 (including non-coding regions) 
within various human genes studied by Cargill et al. (1999). 
In both cases the value of 10 used for the distance between 
peaks in Aston et al. (2013) is close to the range of numbers 
listed therefore it was decided to keep this number constant. 
Varying the distance between the peaks may be an interest¬ 
ing future study. 

Longer sequences means more bases to potentially mutate 
each generation, leading to the formation of three hypothe¬ 
ses. 

Hypothesis 1 

According to the drift-barrier hypothesis, drift prevails over 
selection to determine mutation if the magnitude of the se¬ 
lection coefficient s is less than 1/Ne (where Ne is ef¬ 
fective population size). The strength of selection that re¬ 
duces mutation rate through mutation-selection balance is 
countered by TVe-dependent genetic drift (Sniegowski and 
Raynes, 2013). Following this population size dependence, 
we hypothesise that, for varying gene lengths and numbers, 
the CMR will vary with population size, and that this will 
occur in line with the exponential model identified in Aston 
etal. (2013). 

Hypothesis 2 

Drake summarized all studies up to 1990 and concluded 
that the per nucleotide per generation mutation rate u varies 
inversely with genome size G in microbes (Drake, 1991; 
Sung et al., 2012). Eigen and Schuster (1979) theoretically 
determined the ET in terms of selection pressure and se¬ 
quence length. Using this model, Ochoa et al. (2000) and 
Ochoa (2006) found that longer sequence lengths lead to 
lower ETs in genetic algorithms, defined by the equation 
p = where p is the ET on a single peak landscape, 

L is sequence length, and cr is selection strength which is 
kept constant. Nowak (1992) theoretically determined the 
ET in terms of the relative fitness of mutant and wild type 
(ai and a 2 respectively, where a 2 is assumed to have the 
lower fitness) and the sequence length (m). Giving the ET 

as 1 — q cr it = m , it can be seen that increasing m 

will decrease the ET. In accordance with this and with Drake 
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(1991), it is expected that increasing the sequence length 
will also lower the CMR. 

Hypothesis 3 

We hypothesise that increasing the number of genes (while 
keeping gene length constant) will further lower the CMR 
as it will increase the overall sequence length. Gene num¬ 
bers within biological ranges are expected to lead to CMRs 
close to the range of mutation rates observed for biological 
species; it is expected biological organisms will be evolving 
close to the mutation rate that results in the greatest levels of 
adaptation. 

Simulation Model 

The system used a two-peak fitness landscape (Figure 1), 
with the height of peak 0 constant at 15 and the radius 2, 
the height of peak 1 constant at 10 and the radius 5, and the 
Hamming distance between the peaks set at 10 as per Aston 
et al. (2013). Each individual consisted of one randomly as¬ 
signed maternal and one paternal sequence of alphabet size 
4, and each sequence was split into n genes of length L. Each 
gene had an associated target sequence of length L corre¬ 
sponding to peak 0 and a target sequence corresponding to 
peak 1. For example, if n is set to 4, there will be target se¬ 
quences corresponding to peaks Oily, O 2 I 2 , O 3 I 3 , and O 4 I 4 . 
For simplicity, each peak 0 was set to be all Os and each peak 
1 was randomly generated to be Hamming distance 10 away. 
Recombination was limited to one event per replication as it 
was not the focus of the study. 

The dominance parameter A was set to equal a fraction be¬ 
low 1.0 (0.999999999999999 specifically). This sets the rel¬ 
ative importance of the maternal and paternal alleles while 
preventing either allele from drifting neutrally; if A= 1.0 the 
fitness of only one allele is taken into account, while the 
other can be anywhere in the fitness landscape. For each 
individual, the fitness of each of its n genes was calcu¬ 
lated as the Hamming distance of its maternal and pater¬ 
nal sequences relative to each peak. The fitness values rel¬ 
ative to peak 0 were compared with the fitness values rel¬ 
ative to peak 1 and the highest of these selected to give a 
single fitness value for both the maternal and paternal se¬ 
quences. The resulting maternal and paternal fitnesses were 
compared and subsequently designated as / max and / m in- 
The final relative fitness of each gene was calculated as 
/ = (A x / max ) + ((1 - A) x / min ). The overall fitness 
of the individual was then taken to equal the minimum fit¬ 
ness out of the n genes present. The simulation was run for 
a range of population sizes to confirm the curves observed 
in previous experiments (Channon et al., 2011; Aston et al., 
2013) are observed as the length and number of genes is in¬ 
creased. 

To allow the simulation to complete within a realistic time 
frame, it was optimised to cease running when any one gene 


had lost peak 0 ; this was all the information required to de¬ 
termine the CMR, which was recorded as the mutation rate 
at which 95% of 2000 runs lost peak 0 within 10,000 gener¬ 
ations for any of the possible n genes. Faunching the sim¬ 
ulation for various combinations of parameter values was 
also optimised to allow the mutation rate being tested for a 
given gene number to progress to the next mutation rate once 
100 out of the possible 2000 runs (corresponding to 5%) 
have kept peak 0 for the duration of the simulation. Once 
this threshold has been exceeded, less than 95% of the 2000 
runs will have lost peak 0, and the CMR will not have been 
reached. While this helped significantly with run time, fur¬ 
ther optimisation will be required in the future; it is currently 
not feasible to run the simulation for a wide range of popula¬ 
tion sizes and mutation rates for gene numbers at the upper 
end of the biological range. 

Two approaches were taken when selecting parameter val¬ 
ues within the biological range. Firstly, a gene length of 
1000 bp was selected as a small yet biologically realistic 
gene length. This provided a gene length small enough to 
allow the simulation to run for a range of gene numbers 
within a realistic time frame. Secondly, based on the infor¬ 
mation in Table 1 and the value of 2232 bp mean gene size 
given by Derelle et al. (2006), Arabidopsis thaliana (thale 
cress) was selected as a model organism with a relatively 
short gene length. It is a plant native to Eurasia, with an ef¬ 
fective population size of between 250,000 to 300,000 (Cao 
et al., 2011), known to contain 25,498 genes encoding pro¬ 
teins from 11,000 families (The Arabidopsis Genome Ini¬ 
tiative, 2000). More current estimates of gene number are 
slightly higher but still within a close range for the purpose 
of the model (Bevan and Walsh, 2005). The simulation was 
run for population size 10 with a gene length of 1000 (as per 
the previous runs) or 2000 (range of A.thaliana), but with 
25,000 genes to bring the gene number into the range of 
A. thaliana. 

Results 

Increasing the gene length decreses the CMR in 

line with the exponential model 

Figure 2 shows the CMR for two of the sequence lengths 
studied; increasing sequence length decreases the CMR in a 
single-gene-per-individual diploid in silico evolving system 
modelled on the biological process of meiosis, while main¬ 
taining the exponential relationship with population size pre¬ 
sented in Aston et al. (2013). A.thaliana has a gene length 
of 2232 bp (Derelle et al., 2006), the average gene length in 
humans is 27 kbp (Fewin, 2008), the upper bound for the 
usual gene length range for flies and mammals is 100 kbp 
(Fewin, 2008), and the longest gene in the collagen fam¬ 
ily is 132.83 kbp (Sharma et al., 2005); the length of genes 
present in biological species varies greatly. Table 1 was used 
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Table 1 : Mutation rates for various eukaryotic species. Mutation rate estimates are specified as the number of times a single 
base will mutate spontaneously. If a timeframe (per generation, per cell division) is specified this is listed in the Unit column. 
* refers to mutation rates used for reference in Figure 3. 


Species 

Genome size (Mbp) 

Mutation rate 

Base/genome 

Unit 

Source 

Human 

3080 

1.00E-08 - 2.50E-08 * 

Per base 

Per generation 

Nachman and Crowell (2000); Durbin et al. (2010); Lynch (2010a) 

Human 

3080 

1.75E+02 

Per genome 

Per generation 

Nachman and Crowell (2000) 

Human 

3080 

5.00E-11 - 6.00E-02 

Per base 

Per cell division 

Drake et al. (1998); Lynch (2010a) 

Human 

3080 

1.60E-01 

Per genome 

Per cell division 

Drake et al. (1998) 

Human (Y chromosome) 

58 

3.00E-08 

Per base 

Per generation 

Xue et al. (2009) 

Human, chimpanzee 

3080 

3.00E+00 

Per genome 

Per generation 

Baer et al. (2007) 

Drosophila melanogaster 

120 

4.65E-09 - 6.20E-08 

Per base 

Per generation 

Haag-Liautard et al. (2008); Keightley et al. (2009); Lynch (2010a); 
Keightley et al. (2014) 

Drosophila melanogaster 

120 

9.90E-01 - 1.20E+00 

Per genome 

Per generation 

Baer et al. (2007); Haag-Liautard et al. (2008) 

Drosophila spp. 

120 

7.00E-02 

Per genome 

Per generation 

Baer et al. (2007) 

Drosophila melanogaster 

120 

1.30E-10 - 3.40E-10 

Per base 

Per cell division 

Drake et al. (1998); Lynch (2010a) 

Quail, chicken 

1050 

4.90E-01 

Per genome 

Per generation 

Baer et al. (2007) 

Sheep, cow 

2870 

9.00E-01 

Per genome 

Per generation 

Baer et al. (2007) 

Old World Monkey 


1.90E+00 

Per genome 

Per generation 

Baer et al. (2007) 

Mouse, rat 

2640 

9.10E-01 

Per genome 

Per generation 

Baer et al. (2007) 

Mouse 

2640 

1.80E-10 

Per base 

Per cell division 

Drake et al. (1998) 

Mouse 

2640 

1.10E-08 

Per base 

Per generation 

Drake et al. (1998) 

Saccharomyces cerevisiae 

12.1 

3.30E-10 

Per base 

Per generation 

Lynch (2010a) 

Saccharomyces cerevisiae 

12.1 

3.30E-10 

Per base 

Per cell division 

Lynch et al. (2008) 

Average mammalian 


2.20E-09 * 

Per base 

Per genome/year 

Kumar and Subramanian (2002) 

Mammalian upper bound 


2.61E-09 

Per base 

Per genome/year 

Kumar and Subramanian (2002) 

Caenorhabditis elegans 

100 

8.40E-09 - 2.10E-08 

Per base 

Per generation 

Denver et al. (2004); Haag-Liautard et al. (2008); Lynch (2010b) 

Caenorhabditis elegans 

100 

2.90E+00 

Per genome 

Per generation 

Lynch et al. (2008) 

Arabidopsis thaliana 

157 

7.10E-09 * 

Per base 

Per generation 

Ossowski et al. (2010) 

Arabidopsis thaliana 

157 

6.50E-09 

Per base 

Per generation 

Lynch (2010b) 


to identify a known mutation rate for A.thaliana , the aver¬ 
age mammal, and humans as 7.1 x 10 -9 , 2.2 x 10 -9 , and 
2.5 x 10 _8 respectively (per base, per generation). Figure 
3 shows each of these mutation rates plotted against their 
respective gene lengths, along with the maximal CMR pro¬ 
duced when the simulation was run with sequence lengths 
of 2000, 27000, 100000, and 150000. The maximal CMR 
represents the value at which each curve has levelled out 
(e.g., Figure 2), applicable to the range of population sizes 
normally expected for each species without threat of extinc¬ 
tion (where population size refers to a local population rather 
than the total number of individuals globally). It was taken 
to be the CMR at population size 1000. Note the log scale 
used for the mutation rate as this enables the difference be¬ 
tween the curves and the biological mutation rates to be seen 
clearly. 

While Figure 3 is promising in that none of the biologi¬ 
cal mutation rates are higher than the respective CMRs pro¬ 
duced by the simulation, both sets of mutation rates are be¬ 
tween two and five orders of magnitude from each other. 

Increasing the number of genes produces 
biologically realistic CMRs 

As increasing gene length has been seen to decrease CMR, 
increasing the number of genes was also expected to de¬ 
crease CMR. The simulation model was run with a mini¬ 
mal yet biologically realistic gene length of 1000, with gene 


number doubling from n -1 up to 72=8192. The CMR was 
recorded as the mutation rate at which 95% of 2000 runs 
lost peak 0 for any of the possible n genes. Figure 4 shows 
the CMR decreases by up to three magnitudes as gene num¬ 
ber increases from 1 to 8192, bringing the CMR to within 
an order of magnitude of the biological mutation rates listed 
in Table 1. Curve fitting using R showed the results follow 
quadratic curves; these can be seen to become closer as pop¬ 
ulation size is increased, indicating the decrease in the rate 
of change of CMR with increasing population size seen in 
Figure 2. 

Population size 10 was also run with 25,000 genes of 
length 1000 or 2000 to bring the gene number to within 
the correct range for A.thaliana (The Arabidopsis Genome 
Initiative, 2000; Bevan and Walsh, 2005). Increasing the 
gene length decreased the CMR further to within an order 
of magnitude of the per base per generation mutation rate 
for A.thaliana which is given as 7.1 x 10 -9 (Table 1). Fig¬ 
ure 4 shows per base mutation rate estimates for A.thaliana, 
C.elegans , and D.melanogaster taken from Table 1, each of 
which are within an order of magnitude of the simulation re¬ 
sults for 25,000 genes for population size 10. It is notable 
that the genome size estimates for multicellular eukaryotes 
used in Figure 4 are based on numbers of protein coding 
genes. Protein coding sequences account for a relatively 
small proportion of the total genome length in such organ¬ 
isms (1.2% in humans (Consortium, 2012)), but much more 
of the sequence is functional at some level, probably at least 
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Figure 2: CMR when the GA was run for one gene 
with a sequence length of 2000 and 20000. The expo¬ 
nential line was obtained using the equation y = A — B * 
e ~(( N / c ) ) (with N being population size), and the parame¬ 
ters determined by curve-fitting using R with a least squares 
method. 


9% (Ward and Kellis, 2012), with estimates of up to 80% in 
humans (Consortium, 2012; Lu et al., 2015) (albeit this last 
figure is likely to be a substantial over-estimate (Graur et al., 
2013)). This means that the genome size at which these bi¬ 
ological mutation rates are plotted in Figure 4 is a minimal 
estimate, the true value being substantially, perhaps an order 
of magnitude, higher, therefore putting their observed muta¬ 
tion rates closer to the CMRs estimated by simulation. 


Discussion 

Aston et al. (2013) showed that population size influences 
the CMR that can be tolerated before fitter individuals are 
outcompeted by those that have a greater mutational robust¬ 
ness in both haploid and diploid artificial populations, a re¬ 
sult which has now been demonstrated to have relevance be¬ 
yond artificial systems. Gene lengths given in Derelle et al. 
(2006), Sharma et al. (2005) and Lewin (2008) show that the 


Figure 3: Maximal CMR plotted alongside biological per 
base per generation mutation rates for one gene with 
varying sequence lengths. Maximal CMR is the CMR 
recorded for population size 1000 in the simulation, repre¬ 
senting the point at which the curve for each gene length 
has levelled out (e.g., Figure 2). The biological mutation 
rates were taken from Table 1 for eukaryotic species with 
comparable gene lengths to the sequence lengths used in the 
simulation. 


sequence length of 30 used in Aston et al. (2013) is signif¬ 
icantly smaller than the length of genes observed in biolog¬ 
ical species. The mutation rates in Table 1 are also many 
orders of magnitude lower than the CMR reported in Aston 
et al. (2013). Hypothesis 1 stated that the CMR will always 
have an exponential dependence on population size, while 
hypothesis 2 stated increasing the sequence length will lower 
the CMR; Figure 2 supports these hypotheses as it shows 
that increasing the gene length by a factor of 10 decreases 
the CMR by a factor of 10, with each gene length resulting 
in an exponential fall in CMR with decreasing population 
size. A change in order of magnitude can also be seen in 
the biological values. For example, the mean gene length 
of A.thaliana is given as 2232 bp (Derelle et al., 2006), 
while the average gene length of humans is just over 10 
times longer at 27 kbp (Lewin, 2008). In Table 1, the per 
base per generation mutation rate for A.thaliana is given as 
7.1 x 10 -9 , while the per base per generation mutation rate 
for humans is an order of magnitude higher at 2.5 x 10 -8 . 
Figure 3 shows that, when compared with mutation rates for 
biological species with comparable gene lengths, the CMR 
is always higher as expected. This is a key contribution of 
the study as it indicates the CMR exhibits a comparable re¬ 
lationship with gene length as previously determined for the 
ET (Nowak, 1992; Ochoa et al., 2000; Ochoa, 2006); the 
consistency between the known results for the ET and our 
new results for the CMR increases confidence in the study. 

While it is clear that increasing sequence length decreases 
the CMR, the CMR remains between two and four orders of 
magnitude higher than the biological mutation rates (Figure 
3). Hypothesis 3 stated that increasing the number of genes 
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Figure 4: CMR plotted alongside gene number for vary¬ 
ing population sizes. Data are shown for population sizes 
10 to 80 with results plotted on a log log scale. Gene length 
was kept constant at 1000, while gene number was dou¬ 
bled from 1 up to 8192. The corresponding quadratic lines 
were obtained by curve-fitting using R (specifically Qmod 
<- lm(log(ydata)~ log(xdata) + I(log(xdata) A 2))). Popula¬ 
tion sizes shown represent the steep part of the curve in 
Figure 2 before it levels out. Population size 10 was also 
run with 25,000 genes, correct range for A.thaliana. Gene 
length was set to 1000 to match the other runs or 2000 to 
bring it closer to A.thaliana' s gene length. For reference, the 
range of per base mutation rates from Table 1 is shown for 
A.thaliana , C.elegans (nematode worm), D.melanogaster 
(fruit fly) and humans (with gene number estimates from 
The Arabidopsis Genome Initiative, (2000), Nam and Bar¬ 
tel (2012), Ashburner and Bergman (2005), and Consortium 
(2012) respectively). The mean gene size of A.thaliana is 
2232 bp (Derelle et al., 2006), the median gene length for 
C.elegans is -1700 b (Cutter et al., 2009), the average gene 
length for D.melanogaster is 1130 b, and for humans 27 kb 
(Lewin, 2008). 
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will lower the CMR. Gene numbers within biological ranges 
were expected to lead to CMRs close to the range of muta¬ 
tion rates observed for biological species. Consistent with 
this, Figure 4 demonstrates that when gene length is kept 
constant, doubling the number of genes leads to a reduction 
in the CMR at which 95% of runs lose peak 0 for at least one 
gene. The magnitude of this reduction is variable, but occurs 
across all population sizes shown in Figure 4. The popula¬ 
tion sizes shown represent the steepest part of the curve in 
Figure 2 before it levels out. 

It is expected that biological organisms have evolved to 
mutate below the CMR; mutation in loss of function alle¬ 
les will have less of an impact on fitness compared with the 


same level of mutation in a functional allele. This means 
peaks of lower fitness and greater mutational robustness can 
be expected to exist in real life fitness landscapes (indepen¬ 
dent of the potential effect of epistasis). Real biological or¬ 
ganisms therefore have the potential to lose higher fitness 
peaks at the CMR. There is a lower limit on mutation rate 
as defined by the drift-barrier hypothesis therefore it is ex¬ 
pected biological mutation rates will exist somewhere be¬ 
tween this lower limit and the CMR. At some point(s) in 
parameter space biological mutation rates and CMRs will 
come close; it is expected mutation rates will be just below 
the CMR in at least some cases. 

Figure 4 shows a drop in CMR in the order of three mag¬ 
nitudes as gene number increases from 1 to 8192. This is a 
key contribution of the study as it brings the CMR to within 
an order of magnitude of the biological mutation rates listed 
in Table 1. The decreasing CMR shown in Figure 4 indi¬ 
cated that increasing the gene number further would bring 
the CMR directly into the range of biological mutation rates. 
To test this, population size 10 was run with 25,000 genes of 
length 1000 or 2000 to bring the gene number and length 
to within the correct range for A.thaliana. This decreased 
the CMR further to within an order of magnitude of the per 
base per generation mutation rate for A.thaliana (Table 1). 
Figure 4 also shows per base mutation rate estimates for 
C.elegans , humans, and D.melanogaster taken from Table 
1 , all of which are also within an order of magnitude of the 
simulation results for 25,000 genes. The mutation rates for 
A.thaliana and C.elegans are at or below the predicted CMR 
while D.melanogaster is slightly higher but likely to be be¬ 
low the predicted CMR for a population size greater than 10 
based on the trend in Figure 4. This is an important contri¬ 
bution; it is a demonstration that, in a system in which an 
individual’s fitness is dependent on the minimum fitness of 
its n constituent genes, it is possible to input biologically 
realistic parameter values for a specific organism into the 
simulation model and produce a CMR within the range of 
current biological estimates of mutation rate for that organ¬ 
ism. 

Bringing the CMR into the biological range is a very 
important step in the development of an in silico model 
to directly model the evolution of biological species. Fu¬ 
ture work will require further optimisation of the simulation 
model to increase run time feasibility. The current study 
had a high level of neutrality due to the small width of the 
peaks relative to the size of the adapting sequences. Vary¬ 
ing the width of the peaks and distance between them pro¬ 
vides a potential future study into the effects of neutrality 
on the CMR. It should also be noted that eukaryotic organ¬ 
isms such as those discussed here have their DNA organ¬ 
ised into chromosomes, for example, the five chromosomes 
of A.thaliana (The Arabidopsis Genome Initiative, 2000). 
This gap in the current model presents a potential for fur¬ 
ther development of the model and future study of the effect 
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of recombination on the CMR. Prediction of the CMR for 
populations of varying sizes will enable identification of the 
optimum mutation rate, a crucial parameter in the evolution 
of small populations where CMR is known to vary signifi¬ 
cantly (Aston et al., 2013); this has the potential to influence 
understanding of populations undergoing a bottleneck, un¬ 
der stress, and subsequent conservation strategy for popula¬ 
tions on the brink of extinction. 
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Abstract 

Using the RAevol model we investigate whether the molecu¬ 
lar complexity of evolving organisms is linked to the “com¬ 
plexity” of their environment. Here, the complexity is consid¬ 
ered as the number of different states environments can have. 
Results strikingly show that the number of genes acquired 
by an organism during its evolution does not increase when 
the number of states of the environment increases but that 
the connectivity of their genetic regulation network actually 
does. On the opposite, we show that the mutation rate has an 
important influence on the gene content. We interpret these 
results as a complex intertwining of direct selective pressures 
(the more genes, the better the organisms can be) and robust¬ 
ness and drift thresholds that limit the maximum number of 
genes at different values depending on the mutation rates. 

Introduction 

Since the huge diversity in terms of genome size or num¬ 
ber of genes, even between prokaryotes, has been discovered 
two questions keep interesting the scientific community. On 
one hand, the origin of this diversity and on the other hand, 
whether genome size and number of genes scales with the 
apparent complexity of the organism. 

This second question raised an interesting development. 
First, the tentative of linking genome size and apparent com¬ 
plexity for organisms was a total failure. This failure is 
known as the C-value paradox (Eddy, 2012). Then, the dis¬ 
covery of genes and non-coding DNA seemed to solve this 
paradox. It was proposed that it is the number of genes 
that scales with the apparent complexity of an organism. 
However, this tentative also failed leading to the “N-value 
paradox” (Claverie, 2001). Finally, the most recent notion 
of ’’gene regulatory network” introduced an important com- 
plexification of the genotype-to-phenotype mapping, appar¬ 
ently resolving the N-value paradox. Indeed, the phenotypic 
complexity of an individual can be explained by the number 
of different states its gene regulatory network can have, as 
shown in (Lohaus et al., 2007). Moreover, further studies of 
prokaryotes regulation networks have shown that the num¬ 
ber of transcription factors scale quadratically with the total 
number of genes (Molina and van Nimwegen, 2009). 


Strikingly, the question of the origins of the diversity of 
genome size and number of genes undergoes less develop¬ 
ment as it is generally assumed that the different complex¬ 
ity of the environment faced by an organism could account 
for the complexity of the genotype and of the regulation net¬ 
work. Indeed it is quite intuitive that the more environmental 
conditions an organism is likely to face, the more enzymes it 
needs and the more transcription factors it needs to regulate 
its metabolic activity (Maslov et al., 2009). This intuition is 
supported by experimental data where the classification of 
bacteria according to their lifestyle shows a correlation be¬ 
tween the variability of their environment and their number 
of genes (Parter et al., 2007). 

However, a correlation is not a causal effect. In partic¬ 
ular, there are many other differences between these bacte¬ 
ria as their population size, mutation rate... In order to un¬ 
derstand the causality link between the environmental com¬ 
plexity and the genotypic complexity, one should search for 
organisms of similar biology, mutation rate and population 
size that have evolved in environments of various complex¬ 
ity, an almost impossible quest. An alternative is to use in 
silico experimental evolution (ISEE) to address the question. 
Indeed, ISEE allows to simulate long term evolution in per¬ 
fectly controlled environments with an artificial chemistry 
that, though abstract, is shared by all the lineages. In other 
words, using ISEE one can compare the fate of organisms 
that differ by only one single parameter, being the muta¬ 
tion rate, the population size or the complexity of the en¬ 
vironment they have to cope with to survive. For instance, 
(Bentkowski et al., 2015) developed a model of evolution 
to address this question. They simulated environments with 
different variability and observed that in more variable envi¬ 
ronments organisms have more genes than in simpler envi¬ 
ronment. However, this model does not take into account for 
regulation (although a quadratic regulatory cost is assumed), 
leading to the direct necessity to have more genes if the en¬ 
vironment varies more or more frequently. 

In this paper, we propose to address the question of 
the link between environmental complexity and number of 
genes (and more generally other indicator of complexity) 
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using the RAevol model (Beslon et al., 2010a,b). Indeed, 
this model is able to simulate rich environmental dynam¬ 
ics and uses individuals with explicit genome coding for 
a metabolism and for a gene regulatory network that can 
adapt in real time this metabolism to different environmen¬ 
tal conditions. In a preliminary study, RAevol has already 
been used to study the link between the mutation rate and 
the complexity of genomes and regulation networks: By 
evolving similar organisms with different mutation rates but 
a constant stable environment, we have been able to show 
that, at least in constant conditions, the mutation rate drives 
the complexity of the genome and the complexity of the reg¬ 
ulation network (Beslon et al., 2010b). In particular, we have 
shown that, even in a constant environment, lower mutation 
rates lead to a larger number of genes and to a quadratic in¬ 
crease of the number of transcription factors, strikingly re¬ 
producing many observations by acting on only one single 
parameter (Knibbe et al., 2011). Yet, this preliminary study 
did not allow to conclude on the link between environmental 
complexity and genotypic complexity as it was conducted in 
a constant environment. 

In the experiments presented here we evolved organ¬ 
isms in different environmental conditions to be able to test 
whether an increase of environmental complexity effectively 
leads to an increase of genomic/transcriptomic complexity. 
As we already have shown that the mutation rate is likely 
to be an important factor, we explored simultaneously three 
different environmental complexity and four different muta¬ 
tion rates. For each combination of these two parameters, we 
evolved organisms for 300,000 generations and measured 
the size of their genotype and the complexity of their reg¬ 
ulation networks. 

This paper is organized as follows: In the next section, we 
briefly present the RAevol platform. We then present our ex¬ 
perimental design followed by the results of the simulations 
and the discussion. Finally, we conclude and describe our 
future research directions. 

The Aevol - RAevol platform 

Aevol is an in silico experimental evolution platform 
(Hindre et al., 2012; Batut et al., 2013). It was designed 
to study how the evolutionary conditions shape the molec¬ 
ular structure of an evolving organism ( e.g ., DNA length, 
genes number, operonic structures...) due to direct and indi¬ 
rect selective pressures. In Aevol, a population of individ¬ 
uals evolves through a classical mutation-selection process. 
The specificity of Aevol lies in the genotype-to-phenotype 
mapping that finely models what is observed in bacteria. A 
circular double-stranded DNA sequence is transcribed into a 
set of mRNAs. These mRNAs are then parsed in search for 
Coding DNA Sequences (CDSs - the “genes”) that are trans¬ 
lated into proteins through an artificial genetic code. Finally, 
the proteins are combined to compute the individual’s phe¬ 
notype. We refer the reader to previously published work for 


a complete description of the model and the results obtained 
so far (Knibbe et al., 2007, 2008; Parsons et al., 2010; Batut 
et al., 2013; Misevic et al., 2015). 

RAevol is an extension of Aevol (Beslon et al., 2010a,b). 
It uses the same genome model and the same genetic code. 
However, in RAevol, proteins are able to act as transcription 
factors (TFs) beside their metabolic activity. When acting 
as a TF, a protein may up- or down-regulate the transcrip¬ 
tion of other genes, ultimately controlling the concentration 
of the proteins encoded by these genes. In other words, in 
RAevol, each individual owns a genetic regulation network 
that may dynamically modify its phenotype depending on 
the environmental conditions. Importantly, in RAevol, the 
phenotype of an organism is no longer a static function (as 
it is in Aevol). Rather, it becomes a dynamic function that 
can be evaluated (by comparing it to a target function) at dif¬ 
ferent time steps during what can now be considered as the 
“life” of the individual. Technically, RAevol extends Aevol 
by adding a “transcriptional regulatory code”. In RAevol, 
the target phenotype may change during the life of the in¬ 
dividual, either deterministically or randomly (Figure 1). 
Moreover, specific proteins with no metabolic activity can 
be added into the individual in order to allow it to sense this 
variation in real time. The individual must (and can) dy¬ 
namically adapt to the current target by switching between 
different stable states of its regulation network. The final 
fitness of an individual is the mean value of its fitness mea¬ 
sured at each evaluation time step. 

Experimental design 

For this study, our starting point was the work done in our 
previous experiment with constant environments (Beslon 
et al., 2010b). In this previous work, population of 1000 or¬ 
ganisms were evolved for 15,000 generations in a constant 
environment under 6 different mutation rates ranging from 
2.10 -4 to 5.10 -6 mut/bp/generation. We used a similar ex¬ 
perimental design but adapted it to test evolution in variable 
environments. In particular we significantly increased the 
length of the evolution experiment. Since evolution is likely 
to be much more slower in variable environments than in 
constant ones, we let each simulation evolve for 300,000 
generations. We tested 4 mutation rates (5.10 -4 , 1.10 -4 , 
5.10“ 5 and 5.10 -6 mut/bp/generation). 

Another difference between both experiments is that we 
now use a fitness proportionate selection scheme instead 
of the exponential-rank-based selection process used in 
(Beslon et al., 2010b). The fitness proportionate selection 
scheme is more realistic from a biological point of view 
since it allows evolution to switch from directional selec¬ 
tion to purifying selection. Finally, the maximum value for 
the protein pleiotropy (rc ma;E ) has been increased from 0.03 
to 0.05 to limit the number of genes and allow for faster 
computations (both changes reduces the maximum number 
of genes an organism can - or needs to - acquire during its 


181 
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(B) Genetic regulation network 


(C) Protein concent ration over life time 



Figure 1: Example of an individual after 300,000 generations in the environment 16 (16 different conditions and 4 signals). (A) 
Genome, mRNAs and genes (colors code for the basal transcription rate. The small color dots are the genes). (B) Regulation 
network (arrows indicate links between genes and mRNAs; Red arrows are inhibiting links, green arrows are activating links. 
Black dots are the genes, green dots are the signaling proteins. (C) Protein concentrations during the life of the individual 
(colors indicate the function of the protein). The four signaling proteins are displayed above the graph. (D) Dynamic phenotype 
of the individual (same color code as panel C; thin red lines indicate the target at each time step). This individual evolved in the 
environment 16 (i.e. the four Gaussians can switch independently between their resting and their active states). Here the green 
Gaussians is active from t = 0 to t = 6; The orange Gaussian is active from £ = 0to£ = 12 and the blue and red Gaussians are 
active from t = 7 to t = 12. From t = 13 to t = 20 all Gaussians are inactive. At each environmental change the regulation 
network (B) modifies the transcription levels, hence the protein concentrations (C), resulting in a phenotypic adaptation (D). 


evolution). 

Finally, in order to be able to compare the evolutionary 
outcome in different environments complexity, we carefully 
designed our dynamic environments. In a previous study 
with RAevol (Vadee-Le-Brun et al., 2015) we used an envi¬ 
ronment represented by 4 Gaussians which maximum values 
were changed randomly during the evolutionary process. We 
kept the same idea of having 4 Gaussians each one associ¬ 
ated to a signaling protein informing the individuals for its 
variation. However, the environmental conditions were cre¬ 
ated by moving the mean of the Gaussians along the x axis: 
The Gaussians all have the same height (0.3) and standard 
deviation (0.05). At their “resting state”, they are regularly 
spread along the x axis (m = 0.2, 0.4, 0.6 and 0.8). The 
constant environment is thus the same as in (Vadee-Le-Brun 
et al., 2015). But Gaussians are able to switch to an “active 
state” by increasing their mean value (m) of a small amount 
(0.05) and simultaneously sending a signaling protein to the 
individuals (Figure 1) such that it can trigger a change of the 
state of its regulation network. The lateral variation has been 
chosen instead of the vertical variation used so far because 


it keeps the total area of each phenotypic target constant. In¬ 
deed, Aevol/RAevol are prone to a “filling artifact”: Since 
the maximum area of a gene is bounded, the larger the area 
of the phenotypic target, the higher the numbers of genes in¬ 
dividuals are likely to acquire. By using a lateral variation 
scheme, we kept the area of the target constant whatever the 
number of Gaussian that are in their “resting” or in their “ac¬ 
tive” state. Using this variation scheme and considering that 
the number of environmental conditions is a proxy for the 
environmental complexity, we are able to build a large vari¬ 
ety of environment which complexity varies between 1 to 16 
states depending on which Gaussians are allowed to be in an 
active state and on possible coordination in the resting-active 
switch of the Gaussians. Here we tested three different com¬ 
plexity levels: 1 (actually a constant environment), 2 (only 
the Gaussian 3 is allowed to be in its active state) and 16 (all 
4 Gaussians switch independently between their resting and 
active states). For all environments but the constant one, the 
switch between different targets happens with 10% chance 
at each time step. As individuals live for 20 time steps, on 
average each individual faces 2 switches of phenotypic tar- 
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Figure 2: Mean distance to target of the best individuals 
from the 4 seeds between 299,000 and 300,000 generations. 
Both axes are in log scale. The horizontal red dashed line is 
the distance to target of an individual having no gene. 


get during its lifetime (note that the environmental variation 
is synchronized for all the individuals of the population) but 
this number can widely vary. We label each environment by 
its number of phenotypic targets. 

Finally, all simulations were repeated four times with dif¬ 
ferent random drawing for each couple of environmental 
complexity and mutation rate leading to a total of 4 x 3 x 4 = 
48 simulations. Note that four repetitions is not high enough 
to obtain statistically founded comparison between the dif¬ 
ferent situations but, given the structure of RAevol, we were 
not able to increase the number of repetitions as the amount 
of computational power requested to conduct the experi¬ 
ments started to become prohibitive 1 . 

Results 

Among the 48 simulations, 46 resulted in well-adapted indi¬ 
viduals that were able to properly fit the phenotypic target. 
The two simulations that “failed” are simulations with the 
highest mutation rate in the most complex environment. In 
this case, as Figure 2 shows it, the individuals were not able 
to reduce significantly the distance to the target. There ge¬ 
netic structure thus drifted eventually leading to too small 
genomes that can collapse to 0 genes. 

Figure 3 shows the number of genes for each mutation 
rate and for each environmental complexity. It shows the 
same trend we observed previously (Beslon et al., 2010b): 
The number of genes increases as the mutation rate de- 

! 48 simulations, 300,000 generations and 1000 individuals liv¬ 
ing for 20 time steps indeed leads to a total of 288 billions of time 
steps at each of which we need to compute the network dynamic 
and the resulting phenotype of an individual. 


Figure 3: Distribution of the number of genes of each simu¬ 
lation versus its mutation rate in log log scale. For each dot, 
the corresponding value is the mean value of genes number 
of the best individual between 299,000 and 300,000 gener¬ 
ations. For each environment a log log linear regression is 
plotted. 


creases. However, the number of genes is lower than what 
was observed in (Beslon et al., 2010b), in particular under 
low mutation rates. This is likely to be due to the selection 
scheme we use and to the larger value of Interest¬ 

ingly, the evolution in complex environments (environment 
“16”, black squares) results in a similar linear trend (in log- 
log) but with a very different coefficient since the highest 
mutation rates lead to very low number of genes in this en¬ 
vironment. This confirms the previous observation that in 
complex environments organisms are not able to evolve ef¬ 
ficiently under high mutational pressure. 

For a better readability, we plot all the following results in 
bar plots figure grouped by mutation rates. Figures 4 and 5 
summarize the effect of the different parameter tested on the 
genomic structure. Figures 6 and 7 summarize their effect on 
the structure of the regulation network. Two opposite trends 
are visible in these figures. First, and quite surprisingly, the 
number of genes decreases when the variability of the en¬ 
vironment increases and this effect is much stronger for the 
most complex environment. Furthermore this effect tends to 
be enhanced for high mutation rates. Second, when look¬ 
ing at the size of the non-coding sequences (Figure 5), the 
mean connectivity of the network (Figure 6) and the mean 
link value (Figure 7) we see an opposite trend: complex en¬ 
vironments lead to longer non-coding sequences (when in 
simpler environments we only observe large non-coding se¬ 
quences under a 5.10 -6 mutation rate), higher connectiv- 
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Figure 4: Mean number of genes of best individuals from 
the 4 seeds between 299,000 and 300,000 generations for all 
mutation rates and environments. The error bars represent 
the standard deviation between seeds. 


ity 2 and stronger links in the regulation network (except for 
the highest mutation rate). Finally, a more complex trend 
is observed when looking at the size of the transcribed se¬ 
quences (Figure 8): As already shown in (Parsons et al., 
2010), in “simple” environments, the higher the mutation 
rate, the longer the RNAs. However, for the complex en¬ 
vironments this trend is inverted and the RNAs tend to be 
smaller as the mutation rate increases. 

Figure 9 summarizes the two main trends observed in our 
experiments. It clearly shows a general tendency to com¬ 
plexity increases as the mutation rate decreases. Yet, the en¬ 
vironmental complexity acts in two opposite ways: It tends 
to decrease the number of genes (Figure 9, left panel) but to 
increase the network connectivity (Figure 9, right panel). 

Discussion 

So as a summary of these results: the variability of the envi¬ 
ronment tends to reduce the number of genes but increases 
the non-coding part of the genome. This effect increases 
with the mutation rate. On the contrary, it tends to increase 
the connectivity of the gene regulatory network and the in¬ 
tensity of the links and to decrease the size of mRNAs (when 
the gene regulatory network is efficient enough). Globally, 
individuals in a more variable environment evolve gene reg¬ 
ulatory networks that are smaller but more connected and 
with stronger links. 

Such complex effects are likely to originate from the in¬ 
tertwining of multiple forces acting on genomic and tran- 

2 Note that, in RAevol, the probability for a coding sequence 
to be a transcription factor is quite high. Thus a 0.1 connectivity 
should be considered as a neutral regulation. 
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Figure 5: Mean size of non coding sequences in bp of best 
individuals from the 4 seeds between 299,000 and 300,000 
generations for all mutation rates and environments. The 
error bars represent the standard deviation between seeds. 


scriptomic complexity. Indeed, one can identify four main 
forces that either limit or increase the molecular complexity 
of the organisms: 

Selective forces It is the most intuitive effect: The more 
complex the environment, the more genes are needed to 
fit the different environmental states and the more regula¬ 
tion is needed to switch between environmental states. 

Robustness thresholds Under high mutation rates, long 
DNA molecules are impossible to transmit to the next 
generation hence limiting the maximum complexity of the 
genome. This effect is due to the classical error threshold 
that limit the length of the coding sequences (Eigen, 1971) 
and to a more recently identified threshold imposed on the 
total chromosome length by the rate of chromosomal re¬ 
arrangements (Knibbe et al., 2007; Fischer et al., 2014). 

Indirect selection for evolvability When evolving in con¬ 
tinuously varying environments, selection can indirectly 
favor individuals that increase their rate of variation be¬ 
cause they are able to continuously produce mutants that 
may follow the environmental variations (Earl and Deem, 
2004). 

Drift barrier The level of selection may impose a barrier to 
the evolution of the genomic content: If the contribution 
of all new genes to the fitness is quasi-neutral given the 
level of selection, then the selective forces vanish, thus 
imposing a limit to the acquisition of new genes. 

We suggest that the observed effects are a complex com¬ 
bination of these four forces. Indeed, as we already have 
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Figure 6: Mean connectivity of the gene regulatory net¬ 
work of best individuals from the 4 seeds between 299,000 
and 300,000 generations for all mutation rates and environ¬ 
ments. The error bars represent the standard deviation be¬ 
tween seeds. 


shown it with RAevol in (Beslon et al., 2010b), the ro¬ 
bustness threshold imposes a severe limit to the size of the 
genomes and to the number of genes. This explains the gen¬ 
eral trend on the number of genes observed in Figures 3, 4 
and 9. Indeed, being driven by the spontaneous mutation 
rate, as stated by Fischer et al. (2014), this effect is likely 
to be independent from the selection process, hence from 
the complexity of the environment. Now, as also state by 
Fischer et al. (2014), when the genome size is far from the 
threshold, selective forces can fully play their role. This is 
indeed the case when the mutation rate is low. Interestingly, 
in this situation, the regulation network appears to be more 
connected (Figures 6, 7 and 9) suggesting that under low 
enough mutation rates selective forces drive the complexity 
of the regulation network. However this complexity is only 
visible in terms of component connectivity but not in terms 
of component number (contrary to the number of genes, the 
connectivity can be increased without increasing the length 
of the genome). The combination of these two effects can 
also explain the trend observed on the length of the RNAs. 
Indeed, as the mutation rate increases the length of the RNAs 
increases. As the gene size is constant, the RNA size is an 
indicator of operon structures that allow for a more com¬ 
pact genome at the cost of less regulation possibility. Inter¬ 
estingly, in the more variable environment, the increase of 
mutation rate has no impact on the RNA size, suggesting a 
selection for a more active regulation. 

The last two surprising results are the large increase of 
the size of the non-coding sequences under complex envi¬ 
ronments and of course the lower number of genes observed 


Figure 7: Mean intensity of links in the gene regulatory net¬ 
work of best individuals from the 4 seeds between 299,000 
and 300,000 generations for all mutation rates and environ¬ 
ments. The error bars represent the standard deviation be¬ 
tween seeds. 


in more complex environments. Although more speculative, 
we could propose two hypothesis to explain these results. 

First, when the complexity of the environment and the 
mutation rate simultaneously increase, the selective force 
and the robustness threshold become more and more antag¬ 
onistic. Ultimately, as stated above, the robustness threshold 
imposes a severe limit to the complexity of the genotype, 
forbidding a regulation network to evolve. In highly variable 
environments the genomes ultimately collapse leading to a 
quick drop of the number of genes (Figure 4). In such a situ¬ 
ation, the sole option for the evolution is to increase the level 
of variability. Since the mutation rates are fixed in our sim¬ 
ulation, this can only be done by increasing the size of the 
non-coding sequences, hence increasing the dynamic of the 
gene repertoire. In conclusion, when the robustness thresh¬ 
old contradicts the selective forces, the response of evolution 
is an indirect selection for evolvability. 

Second, we observe that the number of genes accumulated 
in environment 16 is always lower than what is observed 
in environment 2 but that the difference tends to decrease 
as the mutation rate decreases. We propose that, under low 
mutation rates, the complexity limit is no more imposed by 
robustness constraints but rather by the drift barrier: As the 
metabolic effect of a gene tends to decrease with the number 
of genes (as each gene will fill smaller and smaller gap be¬ 
tween the phenotype and the phenotypic target) there must 
be a upper limit to the number of genes the selection can 
act on (indeed, Figure 2 shows that the mean distance to 
target saturates when the mutation rate is decreased). But 
the drift barrier is likely to be more stringent in more vari- 
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Figure 8: Mean size of mRNAs in bp of best individuals 
from the 4 seeds between 299,000 and 300,000 generations 
for all mutation rates and environments. The error bars rep¬ 
resent the standard deviation between seeds. 


able environments since the selection pressure on individual 
genes also depends on the fraction of time they are useful. 
In other words, the more variable the environment, the lower 
the selection pressure on the genes corresponding to the vari¬ 
able part of the phenotypic target the lower the number of 
genes the organisms can accumulate. To test this hypothesis 
we run simulations in environment 2 with a higher selec¬ 
tion pressure for the two highest mutation rates (5.10 -5 and 
5.10 -6 ). In these simulations, we used a selection coeffi¬ 
cient of 2,000 instead of 750 with again 4 repetitions each. 
We indeed observed an increase of the genes number. More¬ 
over, the gain was lower for the 5.10“ 5 mutation rate (from 
26 to 29 genes) than for the 5.10 -6 mutation rate (from 31 to 
43 genes), supporting the hypothesis that for a 5.10 -5 muta¬ 
tion rate the number of genes is limited by both the robust¬ 
ness threshold and the drift barrier but that, for a mutation 
rate of 5.10 -6 only the drift barrier is active. 

Conclusion and perspectives 

In this paper, we experimentally addressed the question of 
the impact of environmental variability on evolution 
whether more variable environments imply more complex 
genomes and genes regulatory networks). To evaluate this 
question, we used an in-silico experimental evolution plat¬ 
form, RAevol, to test three environments of increasing vari¬ 
ability in four different mutation rates. 

Our simulations confirm that the size of the gene reper¬ 
toire is bounded by the mutation rate and that this also lim¬ 
its the complexity of the gene regulatory network. They 
also show that environmental variability indeed increases 
the connectivity and the intensity of links of the gene reg¬ 


ulatory network but decreases its size. Moreover, environ¬ 
mental variability increases the genome size by increasing 
the non-coding part of the genome of individuals who fail 
to regulate their phenotype according to environmental vari¬ 
ations. Finally, we discussed our results and proposed that 
the molecular complexity of an organism is a complex com¬ 
bination of direct selective pressure, indirect selective pres¬ 
sure for evolvability and robustness and drift thresholds. 

The most striking results obtained here is that the gene 
content of our organisms does not follow the environment 
complexity. However this result is not so surprising if one 
realizes that complexity here is defined as ’’variability”. In¬ 
deed, as G.E. Hutchinson (1957) stated, environments are 
complex objects composed of multiple factors (tempera¬ 
tures, light intensity, etc.). What we here defined as a more 
complex environment is not necessarily what evolution “per¬ 
ceives” as the highest complexity! Indeed, environmental 
features that don’t lead to selectable traits (because of drift 
or robustness constraints) are simply ignored by evolution. 
Since drift and robustness levels are dependent on the evo¬ 
lutionary conditions (population size, mutation rates...) one 
can then argue that the complexity of an environment de¬ 
pends on the organism that evolves in it! Clearly, in our 
simulations, the dynamic complexity of environment 16 is 
more complex than the one of environment 2 and, of course, 
1. But on a more global point of view, the constant envi¬ 
ronment may be considered as more complex because the 
target function is more strongly selected. The genetic ele¬ 
ments managing the dynamic part of the phenotype are in¬ 
deed more numerous in environment 16 but the gene content 
is larger in constant environments. 

The main limit of these results is the low number of pa¬ 
rameter tested and, the very low number of repetitions for 
each combination of parameters. We now need to validate 
statistically our results and hypothesis by running more rep¬ 
etitions and more scenarios: environments of intermediate 
complexity (4 and 8), lower mutation rates and higher se¬ 
lection strength. That objective has recently been taken a 
step further with the release of Aevol 5.0 that includes an 
optimized parallel version of the simulator. 

Availability 

Aevol is available under GPL license at the project website: 
http://www.aevol.fr. RAevol is currently in beta-version and 
is available upon request from the authors. 
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Abstract 

Observing the gradual transition from “life” to “non-life” tells 
us a lot about the features of living systems. Based on this idea, 
we started simplifying natural cells by inactivating their genes 
randomly through experimental evolution. Escherichia coli (E. 
coli) cells were cultured in a high-mutagenicity environment to 
accumulate replication errors on their genomes. As a result, we 
observed dozens of mutations, which were supposed to 
deactivate gene expression. In addition, the gene inactivation 
accumulates time proportionally without growth defects so far. 
These results suggest that naturally isolated cells are highly 
redundant in an experimental environment—implying the 
possibility of further simplification deleting hundreds or 
thousands of genes. 


genome were expected to gradually decrease from the initial 
4000. In this abstract, we describe the design and progress of 
the experiment, and discuss future issues and prospects. 

• Model of life 

• Grate efforts 
for upgrading 


.. r . • Only break it 

Artificial cell Simplifying for simplifying 




Introduction 

What is life? In order to answer this fundamental question, 
there have been many attempts to create primitive living 
systems from simpler components—with many challenges. 
For example, building artificial cells from chemical materials 
in a test tube provided experimental platforms to observe the 
unsophisticated cellular behaviors (Kurihara, et al. 2015). This 
bottom-up approach (blue arrow in Fig.l) would allow us to 
observe the gradual transitions from non-life to life. However, 
there are many technical difficulties in this approach to 
upgrading such highly primitive artificial cells toward 
complex and modem forms. On the other hand, the top-down 
approach (green arrow in Fig.l) can avoid such difficulties. 
This approach simplifies the modern cells by reducing their 
sophisticated genes to only retain primitive cellular functions. 
For example in nature, Pelagibacter ubique, which has 
smallest number of genes among free-living bacterial strains 
(1354 genes, Luo, 2015), is supposed to have lost thousands of 
genes through reductive evolution. This bacteria shows slower 
growth compared to other related species regardless of the 
amount of environmental resources—implying the loss of the 
sophisticated ability to respond to environmental changes. In 
the same way, gene loss in bacterial genomes would result in 
more primitive behavior in various aspects. Thus, both 
approaches are significant in exploring the transition between 
living and non-living modes of primitive cells. 

Here, we followed the latter approach to obtain gradually 
simplified derivatives of E. coli by accumulating extensive, 
random mutations in the genome—most of which are 
destructive (Eyre-Walker and Keightley, 2007; Kacar and 
Gaucher, 2012). Accordingly, the functional genes in the 


Figure 1: Simplifying natural cells has some advantages over 
the opposite approach for observing the transition between 
living and non-living systems. 

Evolutionary Experiments 

Ultra violet (UV) irradiation was used in increasing doses as 
to increase the mutation rate. Previous research showed 
experimental evolution with periodic UV irradiation can 
increase gene inactivation in the E. coli genome (Shibai, et al. 
2014). We used the E. coli MDS42 strain where 15 % of its 
genes were manually deleted from the progenitor (Posfai, et 
al. 2006). Cells were cultured in minimal medium and showed 
exponential growth. On the other hand, increasing dosages of 
UV irradiation kills cells at an exponential rate. Thereby, cell 
concentration was constantly measured to control the timing 
of UV irradiation and prevent both extinction and saturation. 
The cells were subcultured every four days and glycerol- 
stocked at the same time. The experiment was replicated six 
times and continued for 168 days. 

Analysis of Evolutionary Changes 

We conducted whole genome resequencing for the evolved 
lineages on the 56 th and 168 th days. Mutations were detected 
by comparing the DNA alignments with the ancestor’s. 
Mutations that caused stop codons at abnormal positions were 
counted as nonsense mutations. Additionally, the mutations 
that caused hazardous gaps in codon reading frames were 
counted as missense mutations. We regarded nonsense and 
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missense mutations as inactive mutations. Genes with at least 
one inactive mutation were counted as inactivated genes, 
though it is still not certain whether their functions were 
completely lost. As a result, 19 to 94 genes were inactivated 
over 168 days in each lineage (Fig.2). The cell lineage with 
the least number of inactivated genes (Series-1) had highly 
aggregative growth by the 56 th day already. The maximum 
growth rate of each lineage after the evolutionary experiment 
was 0.68 to 0.86 [h 1 ]—not significantly declined from 
ancestor’s (0.73 [h 1 ]). 



0 50 100 150 200 

day 

Figure 2: Number of inactivated genes during evolution 

Discussion 

Cells were estimated to grow more than 3000 generations over 
168 days, assuming the growth rates higher than 0.6 [h _1 ]were 
maintained. Dozens of genes accounting for 0.5 to 2.5% of the 
all genes on the ancestral genome displayed traits for 
accumulated inactivation. Surprisingly, even though the loss 
of function occurred at such a scale, a significant drop in the 
maximum growth rate was not observed. This implies that 
many of the genes impacting the growth of E. coli in this 
evolutionary experiment were redundant. One of the possible 
causes of this redundancy is that the natural environments 
were more complex than the experimental environments, so 
that the E. coli can still function with the loss of those genes 
obtained through natural history. To examine this, additional 
analysis on the function of the inactivated genes is needed. 

The lineage with the least number of gene inactivation 
(Series-1) obtained the trait of increased aggregation of cells. 
We inferred that the outer cells of the aggregates protected 
inner cells from the harmful UV irradiation, so that the outer 
cells died and inner cells survived with a decreased number of 
inactive mutations. Such an adoptive evolution obtaining 
multicellular-like behavior is interesting though it does not 
meet the purpose of this study. 

What will we see if we continue this evolutionary 
experiment over a longer period of time? Through first-order 
approximation, the number of functional genes will be about 
1500 in 20 years. That is comparable to Pelagibacter ubique. 
With 10 years of further evolution, the number of functional 
genes would reach the level of Mycoplasma genitalium, which 
has the smallest genome even among parasites (487 genes, 
Choe, et al. 2016)—though the time proportional 
accumulation of gene inactivation would not last so long. How 


would the behavior of the reductively evolved E. coli differ 
from the behavior of natural organisms with a similar number 
of genes? Also, as a complex adaptive system, how would its 
features, like energy efficiency or evolvability change? In 
order to answer these open questions, discussing how living 
systems should be understood is needed side by side with the 
decades-long experimental evolution. 

Materials and Methods 

In all the experiments of this study, cells were cultured in 
mM63 minimal medium at 37 °C with shaking. 

Evolutionary Experiments 

E. coli MDS42 was used as the ancestral strain. The bacteria 
were incubated in a quartz test tube. Optical density (OD) of 
the cells were measured every 3 min. Intensive UV irradiation 
was conducted when the OD value increased by 0.002 from 
the former irradiation. We irradiated UV from the bottom of 
the tube. Irradiation was applied for the dosages that killed the 
ancestral cells so that the survival rates were 10' 2 ~ 10' 3 at a 
time. Aliquot of the culture was transferred into a new tube 
with fresh media at 4-day intervals, so that dilution rate was 
10' 2 . The cells were glycerol-stocked at the same time. 

Analysis of Evolutionary Changes 

Purified genomic-DNA samples of the cells were sequenced 
by Illumina Miseq. Base-pair substitutions and short 
insertions/deletions were identified using SAMtools. 

Maximum growth rates were measured as the increased rate 
of the OD value during the exponential growth phase in the 
absence of UV irradiation. 
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Abstract 

Computational reflection uses software architectures that are 
capable of self-modification at runtime. These systems have 
implementations between two extremes: procedural reflec¬ 
tion, in which unlimited self-modification is available at 
the expense of infinite recursion; and declarative reflection, 
which uses pre-defined metrics to drive the self-modification 
and is hence limited in scope. Biological processes also ex¬ 
ploit the concept of reflection, where natural selection drives 
the process of modification. The concept of a ‘program’ in 
computing has an analogy with an individual member of a 
species. The process of life is discretised into a series of au¬ 
tonomous systems, each of which creates modified versions 
of itself as offspring. This paper unifies the concept of com¬ 
putational reflection with biological systems via a new anal¬ 
ysis of von Neumann’s Universal Constructor. The result is 
a bio-reflective architecture that is capable of unconstrained 
self-modification without the problems of infinite recursion 
that exist in the computational counterparts. The new archi¬ 
tecture is a blueprint for applications in Artificial Life studies, 
Evolutionary Algorithms, and Artificial Intelligence. 

Introduction 

In this paper we unify certain concepts from computational 
reflection (Maes, 1987; Smith, 1984) and Artificial Life 
(ALife). These concepts address how self-representation, 
autonomy and evolution contribute to ‘living’ systems. Each 
of these topics has aspects that are represented in the idea 
of reflection - computing which is ‘about itself’ - and the 
manner in which the genotype simultaneously specifies and 
is maintained by the phenotype. 

As we describe below, computational reflection and bi¬ 
ological systems have many things in common. A model 
of computational reflection gives new insight into biological 
systems. In addition, biological systems give a new perspec¬ 
tive on the nature of computational reflection. 

We introduce this topic with a summary what reflection 
means in computer science, and then go on to discuss the 
implications in ALife. 

Computation without reflection: First we present a 
highly simplified model of Conventional Computing 


(CCOMP), so that the concepts we present below have a 
clear conceptual base. 

In CCOMP, computers run programs that process data. 
On execution, both the program and the data are held as bi¬ 
nary digits in RAM. The CPU ‘reads’ the program, which 
‘acts upon’ the data. For our purposes, we can consider that 
the CPU executes one instruction at a time, and that instruc¬ 
tion works on one data word. This is possible because the 
sequence of operations is specified by the program, and the 
data is organised into a set of related structures in RAM. If 
the program is written correctly, it processes the data in the 
manner intended, even though the CPU never ‘sees’ the en¬ 
tire program or data at any one time. Although there is no 
physical distinction between the program and its data, it is 
usual for the two to be treated separately. The data is pro¬ 
cessed by the program, meaning that some if it is changed 
or manipulated to form the output; the program is fixed. 

The number of instructions needed to do anything useful 
to data is usually very large. In order to make it easier to 
write useful programs high level languages have been de¬ 
veloped, which group sets of instructions together into use¬ 
ful commands. In this way, modern programming languages 
make it possible to write programs without intimate knowl¬ 
edge of the hardware that the programs run on. 

We illustrate this concept in figure 1 . This shows the rela¬ 
tionship between the code base, the interpreter, and the code 
that is executing. The code base is the program written in 
some language. It becomes executing code via the action of 
the interpreter. (By interpreter we mean whatever process is 
accessing the code base and executing it.) 

Although the von Neumann architecture that forms the 
basis of CCOMP has program-data equivalence at the word 
level, there is not usually a direct way for the executing code 
to reflect aspects of its computation back to the code base 
or the interpreter. Although some well known languages 
such as Java support such reflection, there is no requirement 
to use reflection when writing progams. Such feedback is 
needed if the program is to be verified, maintained and im¬ 
proved. In the absence of automated feedback mechanisms, 
these tasks are carried out by human programmers. Models 
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Figure 1: Running a computer program without reflection. 
The code base is analysed by the interpreter, and run as ex¬ 
ecuting code. Solid boxes are data, Dashed boxes are run¬ 
ning programs. Solid arrows indicate the provision of data. 
Dashed arrows indicate an action upon a process. 

of computational reflection attempt to provide this feedback 
at runtime, which is guaranteed to provide the current con¬ 
text of the computation as it is being performed. Here we 
begin to see the relationship of reflection to ALife: the cur¬ 
rent context is the environment in which ALife systems must 
survive, and the process of life is the equivalent of the com¬ 
putational concept of runtime. 

Next, we discuss reflection in abstract terms, and relate 
it to a series of related concepts in ALife. Then we review 
the way CCOMP has used reflection in different program¬ 
ming paradigms. The aim is to gather a set of observations 
on how reflection might work in ALife, which we present 
in the fourth section. We end with a discussion, and some 
proposals on ways to implement reflective ALife systems. 

Computational reflection 

“A reflective system contains structures which represent as¬ 
pects of itself” (Maes, 1987). Reflection is any act of com¬ 
puting that is ‘about itself’: it is computation about the com¬ 
putation that is being performed, without direct reference to 
the goal of the computation. It is self-inspection at runtime , 
and a candidate definition of what comprises a living system. 
Self-inspection is of no use unless it is possible to act upon 
the outcome of the inspection, so CCOMP reflection allows 
self-modification', the ability to create (reify) new sorts of 
first class objects. We link this feature to living systems in 
the latter half of this paper. 

Reflection requires that a representation of the code is 
available as data at runtime; the reflective process uses this 
to deduce which aspects of the program have particular com¬ 
putational features. A data structure representing (a model 
of) the program itself is created during execution of the pro¬ 


Figure 2: Procedural reflection. Key as in figure 1. 

gram, and is used to modify the execution of the program at 
run-time. A special kind of interpreter (a) gives the running 
system access to data representing the system; (b) estab¬ 
lishes the causal connection between the ‘executing code’ 
(the running system) and the ‘base code’ (the system repre¬ 
sentation). Causal connection guarantees that modifications 
to the executing system are reflected in the code base. All re¬ 
flective operations depend on the maintenance of the causal 
connection to ensure that the code base remains a faithful 
representation of the executing code. Different methods of 
reflection use different ways to present the program repre¬ 
sentation as data to the running code (Maes, 1987). 

A reflective system can bring about modifications to itself 
because it is able to generate and analyse data about its own 
computation. It is able to detect an issue in the executing 
code and modify the code base. The reflective interpreter 
reinterprets the code base during execution. By endowing a 
computational process with the power to monitor the com¬ 
putation that is being performed, systems are (theoretically) 
more able tolerate faults, organise their processing, and even 
organise their code base in the light of changing conditions 
(Smith, 1984). How this is achieved depends upon the mode 
of reflection being carried out. Two sub-classes of compu¬ 
tational reflection are described below. The first, procedural 
reflection , allows reasoning about computation by running a 
model of the interpreter on a model of the code whilst the 
code is executing. The second, declarative reflection , at¬ 
tempts to avoid the costs of procedural reflection by abstrac¬ 
tion of the properties of the executing code. 

Procedural reflection 

Procedural reflection encapsulates the role of the interpreter 
within the executing program, and assigns extra duties to it 
(Maes, 1987). The components of a procedural reflective 
architecture are shown in figure 2. The components of stan¬ 
dard computation from figure 1 are all present. An inter¬ 
preter process carries out the execution of the program, but 
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Figure 3: Meta-circularity of procedural reflection shown 
as two reflective layers. Key as in figure 1. 

it also makes information about the computation available to 
a monitoring process, shown as Monitor in the figure. The 
encapsulated interpreter must guarantee a causal connection 
between the code base and the executing code. The moni¬ 
tor is able both to reason about the execution of the program, 
and to act upon this reasoning by making changes to the code 
base, so changing the execution of the program at run-time. 

Procedural reflection requires that a (more or less) com¬ 
plete representation of the program is contained in the reflec¬ 
tive layer. This means that it is possible to generate the layer 
below from the representation in the current layer using the 
interpreter. There is a clash here because the twin goals of 
specifying the system and being able to reason about it can 
be incompatible, leading to duplication of information at the 
very least. 

Challenges in procedural reflection: The method of self¬ 
representation offered by procedural reflection has its chal¬ 
lenges, centering on the problem of exactly when to spawn 
a process that is ‘about’ another process, since each process 
has a computational cost in terms of RAM and CPU. We en¬ 
ter the domain of meta-circularity when we realise that the 
reflective layer in figure 2 is itself a running program. If the 
interpreter has to have a complete representation of the re¬ 
lationship between the code base and the running program, 
it follows that a fully reflective architecture would need a 
second reflective layer, figure 3. Since the reflective repre¬ 
sentation is part of the running code, it must also be moni¬ 
tored, at a higher level. Following this reasoning, we see that 
the recursion in this model can extend ad infinitum , whereby 
a hierarchy of processes are spawned, each monitoring the 
process below and with only the process at the bottom doing 
any actual work. This problem is avoided by bending the 
rules slightly, letting the interpreters represent only parts of 
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Figure 4: Declarative reflection. Key as in figure 1. 

the system at each level and eventually deciding that further 
recursion is no longer fruitful. 

In addition, the role of the interpreter in procedural re¬ 
flection is complex since it has two tasks to perform: inter¬ 
preting the code base, and feeding back information on the 
computation to the monitor. If we forsake the embedding 
of the interpreter within the reflective layer, some of these 
problems can be avoided. 

Declarative reflection 

Declarative reflection avoids the need to specify the code 
base exactly, and instead seeks to generate useful statements 
about the system (Maes, 1987), for example, information 
about the time and space complexity of the executing pro¬ 
cess. The interpreter sits outside the reflective process, and 
merely provides it with a set of metrics. The monitor can 
then act on these metrics and change the code base, which is 
then used as executing code by the interpreter (figure 4). 

The advantage of this approach is that the danger of in¬ 
finite recursion of reflective layers is greatly reduced (al¬ 
though still possible), and the duties of the interpreter are 
more clearly defined. 

Challenges in declarative reflection The benefits of 
declarative reflection come at the expense of the ability of 
the reflective system to detect appropriate conditions that 
should be acted upon. The metrics can describe only what 
has been done by the system; it is much harder to give a de¬ 
scription of how the effect has happened, making it more dif¬ 
ficult for the Monitor to decide how to implement changes. 

For this reason alone, reflective architectures are rarely 
purely declarative. Most reflective architectures use ele¬ 
ments of both procedural reflection and of declarative re¬ 
flection. 

Reflective properties of ALife systems 

We seek ways to apply reflective ideas from CCOMP di¬ 
rectly to ALife systems, in the hope that the advantages of 
reflection can be emulated. However, reflection in biology 
is different from reflection in CCOMR We are trying for a 
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‘unified’ treatment to reflective processes in biology, and so 
possibly find ways to improve reflection both in ALife sys¬ 
tems and CCOMP generally. 

Having reviewed procedural and declarative reflection, 
we must also describe an alternative approach to reflection 
based on phenomena and techniques observed in and in¬ 
spired by biological systems. But first, we briefly review 
some of the issues in computational reflection from a bio¬ 
logical perspective. 

The absence of a ‘designer’: How can a biological sys¬ 
tem be ‘about itself’ in the absence of a pre-defined purpose? 
There are two parts to this question. First we consider how 
a design is specified, then we consider how this goal is met. 
(See also Dennett (1971) for a discussion of design stances.) 

Biological systems use the genotype as a specifier of the 
phenotype via what Pattee (1982) calls the ‘symbol-matter 
articulation’, in which the specification of a machine (the 
symbol side) and its implementation (the matter side) are 
related to one another. This articulation is analogous to the 
requirement for causal connection in reflective languages, 
but it is at its strongest where the system exhibits semantic 
closure (see later). 

Thus, in biology the design of the system seems to be ab¬ 
sent from the model of reflection. But what is the goal of 
self-inspection and self-modification if there is no design to 
which the system can be compared? The answer is that bio¬ 
logical systems introduce ‘purpose’ via evolution, by using 
populations of solutions and applying selection to them. Our 
goal in defining a bio-reflective architecture is to describe 
how these phenomena combine to yield a reflective system. 

Rejecting declarative reflection: Declarative reflection 
offers a means of avoiding having a sophisticated interpreter 
in each reflective layer, and reduces the risk of infinite re¬ 
cursion. However, the declarative approach is rigid: it is 
difficult to detect when it is yielding insufficient informa¬ 
tion about the system, and it is difficult to implement new 
declarative statements when required. 

Declarative reflection is like extrinsic fitness functions in 
Evolutionary Algorithms: it runs the risk of over-specifying 
the problem at hand and ignoring innovative solutions. The 
declarative approach is problematic in ALife because it ad¬ 
heres to an unchanging (and unchangeable) description of 
what the design is, implicit in the metrics that are used to 
monitor the system. It is difficult to define exactly what the 
declarative statements should be a priori , and so it becomes 
difficult to define what should be measured in order to detect 
what changes would be beneficial. The declarative approach 
embeds too much of the reflection in the ‘physics’ of the 
system (Hickinbotham et al., 2016), since the metrics are 
not under control of the ‘biology’, and so cannot be changed 
to improve its representation of the running system. 


Rejecting procedural reflection: From the perspective of 
computer science, the two disadvantages of procedural re¬ 
flection are that it places too many demands on the inter¬ 
preter to allow an efficient implementation, and that the 
meta-circularity of the system leads potentially to infinite 
recursion. 

The goal of reflection is to bring about automation in im¬ 
provement of a computational system, in the same manner 
as natural selection in biological systems. Biological sys¬ 
tems also exhibit recursion in that each organism is created 
by an earlier organism via a replication process. A key point 
is that biological systems are organised such that for most of 
its lifetime an organism is autonomous. In CCOMP, reflec¬ 
tion requires the the ability to recursively spawn new reflec¬ 
tive layers ad infinitum , but the ‘recursion’ in biology is the 
phylogeny of the individual. In bio-reflection, the individual 
does not need to spawn instances of its ancestors in order to 
monitor its state, since the relevant information is packaged 
up in its genome. 

Reflectionless self-replicators Reflection involves hold¬ 
ing a model of the code base and maintaining a causal con¬ 
nection between this and the actual execution of the model 
on the CPU. There is a direct analogy here with the relation¬ 
ship between the genotype and the phenotype in biology. 

In ALife systems, models of biology are subject to exper¬ 
iments by computer simulation. Unlike biology, everything 
about these model systems is knowable, but everything (in¬ 
cluding all of the relevant physics) must be initialised, pa- 
rameterised and implemented. There are also many assump¬ 
tions about the appropriate representation of such simula¬ 
tions in mainstream computers. The attraction of this ap¬ 
proach is that it makes clear the relationship between mech¬ 
anisms of biological innovation and how these models of 
biology could be applied to (models of) computation. 

Many self-replicating ALife systems exist, but these tend 
to be modelled on a hypothetical ‘RNA world’ in which 
each entity inspects an instance of itself in order to create 
a copy (Ray, 1991; Ofria and Wilke, 2004; Hickinbotham 
et al., 2010). These are automata chemistries (Dittrich et al., 
2001): artificial agent-based systems in which each agent is 
a program. 

Many ALife platforms contain instances of agents that it¬ 
eratively manufacture copies of themselves. A mechanism 
for changing the copies, usually called mutation , is intro¬ 
duced in order to explore the design space of the system’s 
universe. In this sense the program that the agents are run¬ 
ning is self-modifying. This process of self-modification is 
central to mechanisms of reflection. 

Although these systems have shown innovation, there 
seems to be an upper bound on the level of complexity they 
can attain, even though there is no theoretical limit on the 
innovation. Could this be related to reflection? We ad¬ 
dress this question by turning to the work of Von Neumann 
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Figure 5: von Neumann’s Universal Constructor Architec¬ 
ture. Key as in figure 1. 
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Figure 6: Comparison of components of procedural reflec¬ 
tion (black text) from figure 2 and the Universal Constructor 
(red text). Key as in figure 1. 


et al. (1966), in his theory of self-reproducing automata. The 
point, also made recently by the McMullin group in Dublin 
(Baugh, 2015; Hasegawa, 2015), is that these systems tend 
to reproduce by a process of self-inspection, von Neumann 
indicated that there are limitations to reproduction by this 
method, linked to the difficulty of ‘reading’ a machine of ar¬ 
bitrary complexity. We argue in addition that these systems 
are reflectionless ; although they appear to be ‘about’ them¬ 
selves, they are merely sophisticated quines (self-copying 
automata with no inputs) that make no reference to an ab¬ 
stract model of what they are. We have made similar points 
in (Hickinbotham et al., 2011). We emphasise here that al¬ 
though a system may have program-data equivalence, it is 
not guaranteed to be ‘reflective’; to achieve this, further con¬ 
ditions must be met. 

Universal constructors von Neumann’s observations 
about the limitations of reproduction by self-inspection led 
to the development of his theory of self reproducing au¬ 
tomata. He defined a set of sub-assemblies that together 
formed a Universal Constructor (UC). The original was cel¬ 
lular automaton-based, but the ideas translate to ALife and 
biological systems. We follow the notation of McMullin 
(2012) in the following. 

The von Neumann architecture comprises four machines 
A, B,C,D plus their machine descriptions <f>(A, C, D), 
figure 5. The process of self-reproduction is divided into 
two parts, which allows machines of arbitrary complexity to 
be duplicated. Only one entity in the system is copied by in¬ 
spection. This is G, which consists of an abstract description 
of everything else in the system: G = <f>(A, B,C,D). The 
remaining four machines function as follows. A is the Con¬ 
structor , which can read G and construct (or express) func¬ 
tioning machines from their descriptions. B is the Copier , 
which can create copies of G by inspection (for this reason, 
G is usually a one-dimensional sequence of instructions). 
The operation of A and B with respect to G is governed by 
G, a Control structure. D is Ancillary Machinery , which 


carries out any other function of the system irrespective of 
the self-replicating assemblages just described. 

We illustrate the overlap between von Neumann’s Uni¬ 
versal Constructor architecture and procedural reflection in 
figure 6. The layout of this figure follows the procedural re¬ 
flection diagram in figure 2, and adds the UC nomenclature 
in red. All of the components of UC bar one are present 
in procedural reflection, but the naming conventions are dif¬ 
ferent. What we have called the Monitor is called the Con¬ 
troller in the UC, but their roles are identical: to orchestrate 
the operations of the other sub-assemblies in the overall ma¬ 
chine. The Interpreter is mirrored in the UC as the Construc¬ 
tor, which takes a description of a machine and creates the 
machine based on that description, in the same way that an 
Interpreter reads source code and creates a working manifes¬ 
tation of the code on a conventional computer architecture. 
The Code Base in the procedural architecture is represented 
in the UC as the symbol <F(D). Both of these labels rep¬ 
resent the abstract concept of a description of a functioning 
machine : D is the functioning machine, and is a descrip¬ 
tion operator. The executing code in the procedural reflec¬ 
tion model is referred to as ‘D: Ancillary machinery’ in the 
UC nomenclature. The only component that UC adds to the 
procedural reflection model is the Copier , and a description 
of all the machines, not just the ancillary D, in G. The copier 
is responsible for duplicating the machine descriptions in <f>. 
We describe its role in bio-reflection below. 

The layout of figure 6 is an unsatisfactory description of 
bio-reflection because it falls victim to the meta-circular ar¬ 
chitecture in the same way as shown as in figure 3, but 
the conceptual link is important for what follows. The UC 
nomenclature in this figure already gives some clues about 
what is missing from the model. 

Bio-reflection 

Having made some observations about reflection in ALife, 
we now propose a new architecture of self-modification, 
which we call bio-reflection , since its development from 
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Figure 7: Bio-reflective Architecture. Universal Construc¬ 
tor terms are shown in red. Key as in figure 1 . 

CCOMP reflection is inspired by biological processes. 

Inspection of the mapping between the UC architecture 
and the procedural reflection architecture in figure 6 shows 
that some of the components of UC are missing from it. 
Firstly there is no machine description for anything but the 
Ancillary Machinery in a single reflective layer. Secondly, 
there is no Copier. 

The absence of descriptions of all the machine appears 
to be the feature that forces reflective systems into a recur¬ 
sive situation of figure 3. von Neumann solved this problem 
by specifying that all the components in the architecture be 
represented by abstract descriptions. 

The copier is missing from the reflective architecture for 
two reasons. First, it is easy to copy source code in CCOMP. 
Second, the concept of reflection requires that the interpreter 
acts upon the source code while the program is running. 
From this perspective, why would one bother to have an ex¬ 
tra machine in the architecture? 

Why is a copier important? This is related to the absence 
of a designer. It is hard to say what a living organism is 
about , because there is no designer, and hence no purpose 
to the organism. This makes the concept of control rather 
more vague: what is it that the entity is being controlled fori 

Due to mutation, variation in the expression of <F(X) 
means that it is likely that a machine will fail sooner or later. 
Having a population of individuals insures against that. A 
population of machines is inevitably exposed to selection. 
By making the copier an intrinsic part of the machinery, we 
can ensure that successful individuals are reproduced in the 
population. 

So bio-reflection works on populations of machines, 
which gives the control component of the architecture we 
seek. This is part of the way we avoid infinite meta-circular 
reflection. But we have to accept that populations have their 
own associated computational costs. 

The bio-reflective architecture follows the UC architec¬ 
ture, but with some external considerations that we describe 


below. The overlap between the bio-reflective architecture 
and UC is shown in figure 7. There are two classes of entity: 
the code base and the executing code as in computational 
reflection. The difference is that all the executing machines 
in the system have a representation in the code base layer. 
The idea is that, following the UC architecture, the machine 
descriptions in the code base are sufficiently rich to allow 
the machines to be constructed from them, but sufficiently 
simple to be easy to copy by the copy machine. In this 
way, the system is self-contained (semantically closed), and 
there is no requirement for a recursive pattern of reflection 
to further organise the self-modification. Reflective actions 
that would have happened at a higher level of reflection are 
handled via changes to the representation in the code base 
(via the copier), and changes to the way that representation 
is interpreted by the constructor. These two components 
together guarantee causal connectedness in a semantically 
closed manner, without the need for any external specifi¬ 
cation (beyond the physics of the system, which should be 
minimised). 


Facets of bio-reflection Mutation is different from self¬ 
modification, because of absence of a design: mutations 
merely modify, then selection identifies which of the modifi¬ 
cations are improvements. Much of what a monitor has to do 
are moved to an external process that runs at the population 
level. 

In order to generate the running code, several things must 
be brought to bear on the description of the machine. The 
functionality of the Interpreter is the most relevant to this 
discussion, but this functionality depends on the ‘physics’ of 
the system, which is not described the genotype, but is im¬ 
plicitly referenced by it. In this way, the abstract description 
is incomplete, but consistent with the executing machines. 
This feature allows us to sidestep the recursion that exists in 
computational reflection. 

The architecture allows new kinds of machines to be rei¬ 
fied via two routes. The first route involves the inaccurate 
copying of the code base via the action of the copier, lead¬ 
ing to mutation in one of the machines. The second is a spe¬ 
cial case of this: an inaccurate reification of the interpreter 
machine has the potential to change the way the entire code 
base is interpreted, leading to a change in the way all ma¬ 
chines are reified. 

In this way, the semantics of the code base are self- 
contained because the machine that interprets the code base 
is encoded in the code base. This feature guarantees causal 
connectedness, but also allows the meaning of a code base 
to be interpreted differently depending on the nature of the 
interpreting machine, in what Pattee calls semantic closure. 
ALife systems that reproduce via self inspection do not have 
this feature, and so cannot be said to be reflective. 
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Semantic Closure and Causal Connectedness Causal 
connection is a major component of semantic closure, but 
it says nothing about the semantics of the system and how 
they arise. In the bio-reflective architecture, semantic clo¬ 
sure means that the semantics of the machine descriptions 
are embodied in the relationship between the executing ma¬ 
chines and their descriptions. This feature of the system is 
most prominent in the encoding of the interpreter, which has 
to read a description of itself, and construct a copy of itself 
from that description. If, by mutation or other error during 
construction, the function of the interpreter changes, then 
the whole meaning of is changed. 

Examples of such a change of meaning are well known 
in biology (Foster, 2007). For example, the ‘SOS response’ 
to DNA damage in E. coli involves the expression of RNA 
polymerases that are more able to successfully copy dam¬ 
aged DNA, but with lower fidelity, thus increasing the mu¬ 
tation rate whilst the population is under stress. A bio- 
reflective AFife system has the potential to emulate such 
phenomena. 

A different continuum Maes (1987) indicates that most 
computational reflective architectures were somewhere be¬ 
tween the Procedural and Declarative extremes. In most 
reflective systems, the code base is represented sufficiently 
explicitly to allow the reflection to occur, and the ‘miss¬ 
ing’ parts of the code base are represented by declarative 
statements. The bio-reflective approach offers a different 
perspective on this: in biological systems the code base 
(genome) is augmented by the physical and chemical pro¬ 
cesses in the cell, allowing the enzyme ‘machines’ to be 
constructed by the ribosome; in computational AFife sys¬ 
tems, the virtual physico-chemical processes are necessarily 
less complex, consisting of the function of opcodes, and the 
‘given’ computational machinery (registers, stacks, etc) as¬ 
signed to each individual. 

Another continuum is to do with the relationship between 
mutation and selection: mutation pushes newly created au¬ 
tomata towards a random/disordered state, but selection en¬ 
sures that the reflective processes of interpretation and mon¬ 
itoring are maintained. 

Discussion 

Reflection and the design stance If computation is to be 
about itself, we need some definition of what the ‘self’ is. 
Here, biology is more straightforward, because the design 
of the organism is self-contained. The issue is more compli¬ 
cated in software because it is engineered: it has a designer, 
a purpose, and an implementation in source code. 

We have an immediate difference between engineered re¬ 
flection and emergent reflection. Engineered reflection re¬ 
quires the conscious act of building a reflective system. 
Computation can happen without reflection precisely be¬ 
cause the computation is engineered: it has a designer. One 


of the problems is that reflection within these architectures 
holds the designer as a ‘third option’ in which the ulti¬ 
mate design can reside. The AI community have reflection 
to be about debugging, optimising etc, which is the origin 
of Smith’s ‘computing about itself’ metaphor. Reflective 
acts are anything that is done ‘about’ the computation, such 
as debugging, optimising etc. This assumes the design is 
‘known’ and that we are attempting to refine the implemen¬ 
tation to reflect the design. 

How can biological processes be reflective if they are not 
designed? This is a core question that we must answer if 
we are to build a bridge between computer and biological 
sciences. Firstly, we note that in some ways, the absence 
of a designer makes reflection simpler: the ‘ultimate’ de¬ 
sign is simply the system itself; whether the design resides 
in the genotype or the phenotype is immaterial. Whereas in 
CCOMP, programs that did things could exist before reflec¬ 
tive processes were available, in biology this was not possi¬ 
ble: how could a process emerge that was ‘doing’ something 
before the system became self-referential? Without the self- 
referential process, the biology is nothing more than com¬ 
plex (carbon) chemistry. Only when a specification and an 
interpreter became available did life truly emerge. 

We are not refuting RNA world with this argument; we 
are merely stating that even there, some RNA would be tem¬ 
plate, and some would be machine. 

On the ‘self’ in Computer Science Unlike biological or¬ 
ganisms, computer programs are designed. Following from 
this, Smith (1984) noted that we have an issue: what is the 
design of the program? Is it the concepts in the program¬ 
ming team’s heads? The source code? The executing code? 
The answer appears to be a combination of all three: the 
concepts give the broad thrust of what needs to be done; the 
source code is an instantiation of these ideas, plus bugs; the 
executing code is what actually happens, which is what the 
source code is trying to persuade the interpreter to do. But 
some of the actions of the interpreter then become part of 
the design, and these are not necessarily in the source code. 

McMullin’s lab has attempted to build instances of the 
von Neumann replicating architecture in Tierra and Avida. 
Both implementations of the von Neumann architecture in 
these systems tend to collapse to their original RNA world 
configurations of replication, unless strict constraints are 
placed on the evolution. For example in (Baugh, 2015), the 
system could only be made viable by deleting any offspring 
with a different length from the parent. These results are 
important, because they allow us to identify features of au¬ 
tomata chemistries that foster the more sophisticated self- 
reproducing entities described by von Neumann. 

The model we propose could be implemented as an em¬ 
ulation of biology by allowing the monitor to act as a gene 
regulator only, or towards CCOMP-style learning system, 
by configuring the Monitor to pass ‘message sends’ to the 


198 


selection process. In either configuration, the reflective acts 
are fully autonomous and internally consistent. 

Conclusion 

The dream of AI is to have systems that adjust themselves to 
meet our needs, be they robotic, informational or biological. 
The foundation of these systems is that they are reflective: 
they are able to reason about themselves. This problem is 
encountered in ALife, Artificial Intelligence and CCOMP, as 
noted by Pattee (1982). By considering ALife as reflective 
systems, we are more able to draw from this wider body of 
research and move the field forward. 

It is remarkable that the Universal Constructor design 
from 1949 is still relevant today. By casting it as a reflec¬ 
tive system, new emphasis can be placed on the components 
of an ALife system, suggesting new avenues for research 
in both Artificial Life and Computer Science. To quote a 
review of the previous draft of this paper: “Interesting ques¬ 
tions present themselves: should we be using reflective lan¬ 
guages to build reflective replicators? Would self-awareness 
in a replicator, i.e., introspection into its own method of 
replication/ecological niche, enable an enhanced form of au- 
toconstructive evolution a la Spector and Robinson (2002)? 
A Lamarckian-Darwinism where organisms simulate their 
offspring in sandboxes and hack their own genomes accord¬ 
ingly?” 

CCOMP reflection does not account for copying of code 
bases in the reflective act, and the von Neumann architec¬ 
ture does not make clear the role of the monitor, or take 
natural selection into account within the model. With our 
bio-reflective architecture presented here, we have arrived at 
a consistent representation that clarifies the distinction be¬ 
tween CCOMP and biology, and provides an intellectual ba¬ 
sis for future ALife implementations. 
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Abstract 

Using algorithmic complexity theory methods, we propose 
a robust computational definitions for open-ended evolution 
(OEE) and adaptability of computable dynamical systems. 
With this framework, we show that decidability imposes ab¬ 
solute limits to the growth of complexity on computable dy¬ 
namical systems up to a logarithm of a logarithmic term. 
Conversely, systems that exhibit open-ended evolution must 
be undecidable and have irreducible behaviour through the 
evolution of the system. Complexity is assessed in terms 
of three measures: sophistication, coarse sophistication and 
busy beaver logical depth. 


Introduction and Preliminaries 

Broadly speaking, a dynamical system is one that changes 
over time. Prediction of the future behaviour of a dynamical 
system is a main issue for science generally: scientific the¬ 
ories are tested upon the accuracy of their predictions; and 
establishing invariable properties through the evolution of a 
system is an important goal. Limits to this predictability are 
known in science. For instance, chaos theory establishes the 
existence of systems in which small deficits in the informa¬ 
tion of the initial states makes accurate predictions of future 
states unattainable. However, on this document we focus 
on systems for which we have unambiguous, finite (on size 
and time) and complete descriptions of their initial states and 
their behaviour: computable dynamical systems. 

Since their formalization by Church and Turing, the class 
of computable systems have shown that, even without infor¬ 
mation deficits (i.e., with complete descriptions), there are 
future states that cannot be predicted, in particular the state 
known as halting state (Turing, 1936). We will use this re¬ 
sult to show how prediction imposes limits to the growth of 
complexity during the evolution of a system. 

The relationship between dynamical systems, com¬ 
putability and Turing machines, along with the implied un¬ 
predictability of their behaviour, was observed by Moore 

*The adapted definitions and complete proofs used in this ar¬ 
ticle can be found at the extended version of this article http: / / 
complexitycalculator.com/algorithmicOEE.zip 


(Moore, 1991) and Wolfram (Wolfram, 2002). Delvenne, 
Kurka and Blonde (Delvenne et al., 2006) have explored ro¬ 
bust definitions for computable (effective) dynamical sys¬ 
tems and universality generalizing Turing’s halting states, 
along with conditions and implications for universality, de¬ 
cidability and their relationship with chaos. Undecidability 
of certain properties for certain analytic dynamical systems 
have been studied by Boumez (Boumez et al., 2013). The 
definitions and general approach used in this paper differ 
from those sources, but are ultimately related. 

Computable Functions 

In a broad sense, an object x is computable if it can be de¬ 
scribed by a Turing machine (Turing, 1936); for example if 
there exists a Turing machine that produces x as an output. 
Is clear that any finite string on a finite alphabet is a com¬ 
putable object. Following Turing’s tradition, we provide be¬ 
low a more formal definition. 

As usual, we can define a 1 to 1 mapping between the 
set of all finite binary strings B* = {0,1}* and the natural 
numbers by the relation induced by the lexicographic order 
of the form: {(“”, 0), (“0”, 1), (“1”, 2), (“00”, 3),...}. Us¬ 
ing this relation we can see all natural numbers (or positive 
integers) as binary strings and vice versa. Accordingly all 
natural numbers are computable. 

A string p is a valid program for the Turing machine T if 
during the execution of T with p as input all the characters 
in p are read. We call T(p) the output of the machine, if it 
stops. A Turing Machine is prefix-free if no valid program 
can be a proper substring of another valid program (but can 
be a postfix of one). We call a valid program a self delim¬ 
ited object. Note that, given the relationship between natural 
numbers and binary strings, the set of all valid programs is 
an infinite proper subset of the natural numbers. 

Formally, a function / : N N is computable if there 
exist a Turing Machine T such that f{x) =T(x). A Turing 
Machine U is called universal if there exist a computable 
function g such that for every Turing machine T there ex¬ 
ist a string (T) £ B* such that f(x) = U((T)g(x )), where 
(T)g(x) is the concatenation of the strings (T) and g(x). 
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Given the previous case, (T) and g{pc) are called a codifi¬ 
cation or representations of the function / and the natural 
number x, respectively. From now on we will denote by (/) 
and ( x) the codification of / and x. The codification g(x) is 
unambiguous if is injective. 

For functions with more than one variable, if x is a pair 
x = (xi,X2), we say that the codification g(x) is unambigu¬ 
ous if its injective and the inverse functions gf 1 : g(x) 
x\ and gf 1 : g(x) are computable. If x is a tuple 

(xi ,., Xi , ...,x n ), then the codification g(x) is unambigu¬ 
ous if the function (x,i) ^ xi is computable. 

A sequence of strings Si, S 2 , Si, ... is computable if the 
function A : z 1 —is computable. A real number is com¬ 
putable if its decimal expansion is a computable sequence. 
For complex numbers and higher dimensional spaces, we 
say that they are computable if each of its coordinates are 
also computable. 

Finally, for each of the described objects, we call the rep¬ 
resentation of the associated Turing machine the representa¬ 
tion of the object for the reference Turing machine U, and we 
define computability of further objects by considering their 
representations. For example, a function / : R —>> M is com¬ 
putable if the mapping (x^ 1 —( f(x {)) is computable and 
we will denote by (/) the representation of the associated 
Turing machine, calling it the codification of / itself. 

Algorithmic Descriptive Complexity 

Given a prefix-free universal Turing Machine U with alpha¬ 
bet E, the algorithmic descriptive complexity (also known as 
Kolmogorov complexity and Kolmogorov-Chaitin complex¬ 
ity (Kolmogorov, 1965; Chaitin, 1982) of a string s E E* is 
defined as 

Ku(s) = min{|p| : U(p) = s}, 

where U is a universal prefix-free Turing Machine and \p\ is 
the number of characters of p. 

The algorithmic descriptive complexity measures the 
minimum amount of information needed to fully describe a 
computable object within the framework of a universal Tur¬ 
ing machine U. If U(p) = s then the program p is called 
a description of s, the first of the smallest descriptions (on 
alphabetical order) is denoted by s* and by (s) a non nec¬ 
essarily minimal description computable over the class of 
objects. If M is a Turing machine, a program p is a de¬ 
scription or codification of M for U if for every string s we 
have that M(s) = U(p(s)). In the case of numbers, func¬ 
tions, sequences and other computable objects we consider 
the descriptive complexity of its smallest description. For 
example, for a computable function / : M M, K(f) is 
defined as K(f*) where /* E B* is the first of the minimal 
descriptions for /. 

Of particular importance for this document is the conditional 
descriptive complexity , which is defined as: 

Ku(s\r) 5 = min{|p| : U(pr) = 5}, 


where pr is the concatenation of p and r. This measure can 
be interpreted as the smallest amount of information needed 
to describe s given a full description of r. We can think of p 
as a program with input r. 

One of the most important properties of the descriptive 
complexity measure is its stability : the difference between 
the descriptive complexity of an object, given two universal 
Turing machines, is at most constant. Therefore the refer¬ 
ence machine U is usually omitted in favor of the universal 
measure K. From now on we will omit the subscript from 
the measure. 

Randomness A string x is known as r-random or incom¬ 
pressible if K(x) > \x\ — r. This definition states that 
a string is random if it does not has a significantly shorter 
complete description than the string itself. A simple count¬ 
ing argument shows the existence of random strings. Now, 
is easy to verify that every string x has a self delimited 
computable unambiguous codification with strings of the 
form l lo s I s 101|s (Li and Vitanyi, 1997, section 1.4). There¬ 
fore, there exist a natural r such that if x is r- random then 
K(x) = \x\ — r + 0(log \x\), where 0(log |x|) is a posi¬ 
tive term. We will say that such strings hold the randomness 
inequality tightly. 

Let M be a halting Turing Machine with description (M) 
for the reference machine V. A simple argument can show 
that the halting time of M cannot be a large random number: 
let U H be a Turing Machine that emulates U while count¬ 
ing the number of steps, returning the execution time upon 
halting; if r is a large random number then M cannot stop 
in time r, otherwise the program ( U H )(M) will give us a 
short description of r. This argument is summarized by the 
following inequality: 

K(T(M)) < K(M) + 0(1), (1) 

where T(M) is the number of steps that took the machine 
M to reach the halting state, the execution time of the 
machine M. 

Computable Dynamical Systems 

Formally, a dynamical system is a rule of evolution in time 
within a state space; space that is defined as the set of all 
possible states of the system (Meiss, 2007). For this work 
we will focus in a functional model for dynamical systems 
with a constant initial state and variables representing the 
previous state and the time of the system. This model allows 
us to set halting states for each time on a discrete scale in 
order to study the impact of the descriptive complexity of 
time during the evolution of a discrete computable system. 

A deterministic discrete space system is defined by an 
evolution function (or rule) of the form = S(Mo,t), 

where Mo is called the initial state and t is a positive inte¬ 
ger called the time variable of the system. The sequence of 
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states M 0 , Mi,... , M t ,... is called the evolution of the sys¬ 
tem. Given a reference universal Turing Machine U, if S' is a 
computable function and Mo is a computable object, we will 
say that S is a computable dynamical system. An important 
property of computable dynamical systems is the uniqueness 
of the successor state which implies that equal states must 
evolve equally given the same evolution function, in other 
words: 

M t = M t , => M t , +1 =M m , ( 2 ) 

The converse is not necessarily true. 

Now, a complete description of a computable system 
S(Mq , t) should contain enough information to compute the 
state of the system at any time and hence it must entail the 
codification of its evolution function S and a description of 
the initial state M 0 , which is denoted by (Mo). As a con¬ 
sequence, if we only describe the system at time t by a 
codification of M t , then we do not have enough informa¬ 
tion to compute the successive states of the system. So we 
will define the complete description of a computable sys¬ 
tem at the time t as a unambiguous codification of the or¬ 
dered pair composed by (S) and (M t ), i.e. ((5, (M t ))), with 
((S', (Mq))) representing the initial state of the system. It is 
important to note that, for any computable and unambiguous 
codification function g of the stated pair, we have 

K(((S, (M t ))) < K(S ) + K(M 0 ) + K(t) + 0(1); 

as we can write a program that uses the descriptions for S, 
Mo and t to find the parameters and then evaluate S(Mo, t), 
finally producing M t . 

Open-Ended Evolution in Computable Dynamical 
Systems 

Defining and establishing the properties required for a sys¬ 
tem to exhibit Open-ended evolution (OEE) is considered 
an open question (Bedau et al., 2000; Soros and Stanley, 
2014; Standish, 2003) and OEE has been proposed as a re¬ 
quired property of evolutionary systems capable of produc¬ 
ing life (Ruiz-Mirazo et al., 2002). This has been implicitly 
verified by various experiments in-silico (Lindgren, 1992; 
Adami and Brown, 1994; Lehman and Stanley, 2008; Auer¬ 
bach and Bongard, 2014). 

A line of thought posits that open-ended evolutionary sys¬ 
tems tend to produce families of objects of increasing com¬ 
plexity (Bedau, 1998; Auerbach and Bongard, 2014). Fol¬ 
lowing this idea, OEE in a computable dynamical system 
can be characterized as a process that has the property of 
producing families of objects of increasing complexity. For¬ 
mally, given a complexity measure C, we say that a com¬ 
putable dynamical system S exhibits open-ended evolution 
with respect to C if for every time t there exists a time 
t f such that the complexity of the system at the time t f is 
greater than the complexity at the time t , i.e. C(S(Mq , t )) < 


C(S'(M 0 ,f / ), where a complexity measure is a (not neces¬ 
sarily computable) function that goes from the state space to 
a positive numeric space. 

The existence of such systems is trivial for complexity 
measures on which any infinite set of the natural numbers 
(not necessarily computable) contains a subset where the 
measure grows strictly: 

Lemma 1. Let C be a complexity measure such that any 
infinite set of natural numbers has a subset where C grows 
strictly. Then a computable system S(Mq , t) is a system that 
produces an infinite number of different states if and only if 
it exhibits OEE for C. 

Given the previous lemma, a trivial computable system 
that simply produces all the strings in order exhibits OEE 
on a class of complexity measures that includes algorithmic 
description complexity. However, intuitively, we conjecture 
that such systems have a much simple behaviour compared 
to what we observe on the natural world and the cited artifi¬ 
cial life systems. We can avoid some of these issues with a 
stronger version of OEE. 

Definition 2. A sequence of naturals no , ri\ ,..., ... ex¬ 
hibits strong open-ended evolution (strong OEE) with re¬ 
spect to a complexity measure C if for every index i there 
exists an index i' such that Cinf) < C(n^), and the com¬ 
plexity of the sequence C (no), C (ni),..., C(n^),... does not 
drop significantly, i.e. i < j implies Cinf) < C(rij) + 7 (j) 
where 7 (j) is a positive function that does not grow signifi¬ 
cantly. 

It is important to note that, while the definition of OEE al¬ 
lows significant drops of complexity during the evolution of 
a system, strong OEE requires for the complexity of the sys¬ 
tem to not decrease significantly during its evolution. In par¬ 
ticular we will ask for C ( rij ) — y{j) to not be upper-bounded 
for any infinite subsequence. 

Various complexity measures have been proposed that 
assign low complexity to an infinity of natural numbers 
deemed to be simple. Two examples of such measures are 
logical depth (Bennett, 1988) and sophistication (Koppel, 
1988). Nonetheless, if C is a complexity measure capable 
of measuring OEE then there must exist infinite sets where 
C grows strictly. A trivial counting argument shows that the 
algorithmic descriptive complexity is unbounded in any in¬ 
finite set, therefore is also unbounded in any set where any 
other complexity measure grows strictly. Formally: 

Lemma 3. If a system S exhibits OEE (and strong OEE) for 
a complexity measure C then it also shows OEE with respect 
to the descriptive complexity K. 

Given the previous lemma, the results shown in the next sec¬ 
tion can also be extended to any other complexity measure 
capable of showing OEE. 
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A Computational Model for Adaptation 

Lets start by characterizing the evolution of an organism or 
a population by a computable dynamical system. It has been 
argued that, in order for adaptation and survival to be pos¬ 
sible, an organism must contain a representation of the en¬ 
vironment so that, given a reading of the representation, the 
organism can choose a behaviour accordingly (Zenil et al., 
2012). The more approximate this representation is, the bet¬ 
ter the organism is adapted to its environment. If the or¬ 
ganism is computable, this information can be codified by 
a computable structure. We will denote this structure by 
M t , where t stands for the time corresponding to each of the 
stages of the evolution of the organism. This information is 
then processed following a finitely specified unambiguous 
set of rules that, in finite time, will determine the adapted 
behaviour of the organism according to the information cod¬ 
ified by M t . We will denote this behaviour (or a theory ex¬ 
plaining it) with the program p t . An adapted system is one 
that produces an acceptable approximation of its environ¬ 
ment. An environment can also be represented by a com¬ 
putable structure E. In other words, the system is adapted if 
p t ( M t ) produces E. Based on this idea we propose a robust, 
formal definition for adaptation: 

Definition 4. Let K be the prefix-free descriptive complex¬ 
ity. We say that the system at the state M n is e-adapted to 
the E if: 

K{E\S{M Q ,E,n))<e. (3) 

The inequality states that the minimal amount of informa¬ 
tion that is needed to describe E from a complete description 
of M n is e or less. This information is provided in form of 
a program p that produces E from the system at the time n. 
We will define such programs p as the adapted behaviour of 
the system. Uniqueness for p is not required. 

The proposed structure for adapted systems is robust since 
K(E\S(Mo, E, n)) is equal or less than the numbers of 
characters needed to describe any computable method of de¬ 
scribing E from the state of the system at the time n, ei¬ 
ther be a computable theory for adaptation or a computable 
model for an organism that tries to predict E. Follows that 
any computable characterization of adaptation that can be 
described within e number of bits meets the definition of e- 
adapted given suitable choice of E , the adaptation condition 
for any given environment. 

As a simple example, we can think of an organism that 
must find the food located at the coordinates (x,j) on a grid 
in order to survive. If the information of an organism is cod¬ 
ified by a computable structure M (such as DNA), and there 
is a set of finitely specified, unambiguous rules that gov¬ 
ern how this information is used (such as the ones speci¬ 
fied by biochemistry and biological theories) codified by a 
program p, then we say that the organism finds the food if 
p(M) = (j, k). If | (p) | < e, then the we say that the organ¬ 
ism is adapted according to a behaviour that can be described 


within e characters. The proposed model for adaptation is 
not limited to such simple interactions. For a start, we can 
suppose that the organism sees a grid, denoted by g, of size 
n x m with food at the coordinates (j, k). The environment 
can be codified as a function E such that E(g) = (j,k) and 
e-adapted implies that the organism defined by the genetic 
code M, which is interpreted by a theory or behaviour writ¬ 
ten on e bits, is capable of finding the food upon seeing g. 
Similarly, more complex computational structures and inter¬ 
actions imply e-adaptation. 

Now, describing an evolutionary system that (eventually) 
produces an e-adapted system is trivial via an enumeration 
machine (the program that produces all the natural numbers 
in order), as it will eventually produce E itself; moreover, 
we want for the output of our process to remain adapted. 
Therefore we propose an stronger condition called conver¬ 
gence : 

Definition 5. Given the description of a computable dynam¬ 
ical system S(Mo,E, t ) where t e N is the variable of time, 
M 0 is an initial state and E is an environment, we say that 
the system S converges towards E with degree e if there ex¬ 
ist S such that t> 5 implies K(E\S(Mo, E,t)) < e. 

For a fixed initial state Mo and environment E, is easy 
to see that the descriptive complexity of a state of the sys¬ 
tem depends mostly on t: we can describe a program that, 
given full descriptions of S, E , M 0 and t finds 5(Mo, E , t), 
therefore 

K(S(M 0 , E, t)) < K(S) + K(E)+ (4) 

K(M 0 ) + K(t) + O(l\ 

where the constant term is the length of the program de¬ 
scribed. In order words, as the time t grows, time is the 
main driver of the descriptive complexity within the system. 

Non-Randomness of Decidable Convergence Times 

One of the most important issues for science is the prediction 
of the future behaviour of dynamical systems. The predic¬ 
tion that we focus on is that of the first state of convergence 
(definition 5): Will a system converge and how long it will 
take? In this section we show what the first limit that de¬ 
cidability imposes to the complexity of the first convergent 
state is. A consequences of this is the existence of undecid- 
able adapted states. 

Formally, for the convergence of a system S with degree 
e to be decidable there must exist an algorithm D e such that 
D e (S, Mo, E,S) = 1 if the system is convergent at the time 
S and 0 otherwise. Moreover, we can describe a machine 
P such that given full descriptions of D e , S and M 0 it runs 
D e with inputs S and Mo while running over all the possi¬ 
ble times t, returning the first t for which the system con¬ 
verges. Note that S = P((D e ) (S) (M 0 ) (E)), hence we have 
a short description of S and therefore S cannot be random : if 
5(Mo, E , t) is a convergent system then 

K(S) < K(D e ) + K(S) + K(E) + K(M 0 ) + 0(1), (5) 
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where S is the first time at which convergence is reached. 
Note that all the variables are known at the initial state of 
the system. This result can be resumed by the following 
lemma: 

Lemma 6. Let S be a system convergent at the time S. If S 
is considerably more descriptively complex than the system 
and the environment, i.e. for every reasonably large natural 
number d we have that 

K(S) > K(S ) + K(E) + K(M 0 ) + d, 

then S cannot be found by an algorithm described within d 
number of characters. 

We call such times random convergence times and the state 
of the system Ms a random state. It is important to notice 
that the descriptive complexity of a random state must be 
also high: 

Lemma 7. Let S be a convergent system with a complex 
state S(Mo, E,8). For every reasonably large d we have 
that 

K{S{M 0 , E, 8)) > K(S) + K(E) + K{M 0 ) + d. 

In other words, if 5 has high descriptive complexity, then 
there does not exist a reasonable algorithm that finds it even 
if we have a complete description of the system and its 
environment. It follows that the descriptive complexity of a 
computable convergent state cannot be much greater than 
the descriptive complexity of the system itself. 

What a reasonably large d is has been handled so far with 
ambiguity as it represents the descriptive complexity of any 
computable method D e . We might intend to find convergent 
times, which intuitively cannot be arbitrarily large. It is easy 
to 4 cheat ’ on the inequality 5 by including in the description 
of the program D e the full description of the convergence 
time 8, which is why we ask for reasonable descriptions. 

Another question left to answer is whether complex con¬ 
vergent times do exist for a given limit d, considering that 
the limits imposed by the inequality 5 loosen up in direct 
relation to the descriptive complexity of S, E and Mo. 

The next result answers both questions by proving the ex¬ 
istence of complex convergent times for a broad characteri¬ 
zation of the size of d: 

Lemma 8 (Existence of Random Convergence Times). Let 
F be a total computable function. For any e, there exist 
a system S(Mq, E,t) such that the convergence times are 
F(S , Mo, E)-random. 

Let us focus on what the previous lemma is saying: F 
can be any computable function. It can be a polynomial or 
exponential function with respect to the length of a given 
descriptions for Mo and E. It can also be any computable 
theory that we might propose for setting an upper limit to the 


size of an algorithm that finds convergence times given de¬ 
scriptions of the system behaviour, environment and initial 
state. In other words, for a class of dynamical systems, find¬ 
ing convergence times, therefore convergent states, is not 
decidable even with complete information of the system and 
its initial state. 

Randomness of Convergence in Dynamic 
Environments. 

So far we have limited the discussion to fixed environments. 
However, as observed in the physical world, the environment 
itself can change over time. We call such environments dy¬ 
namic environments. In this section we extend the previous 
results to cover environments that change depending on time 
as well as on the initial state of the system. We also pro¬ 
pose a weaker convergence condition called weak conver¬ 
gence and propose a necessary (but not sufficient) condition 
for the computability of convergent times called descriptive 
differentiability. 

We can think of an environment E as a dynamic com¬ 
putable system, a moving target that also changes with time 
and depends on the initial state M 0 . In order for the system 
to be convergent, we propose the same criterion: there must 
exist 8 such that n > S implies 

K(E(M 0 ,n)\S(M 0 ,E(M 0 ,n),n)) < e. (6) 

A system with a dynamic environment also meets the in¬ 
equality 5 and lemmas 6 and 8 since we can describe a ma¬ 
chine that run both S and E for the same time t. 

Now with dynamic environments, E is a moving target 
and therefore is convenient to consider an adaptation period 
for the new states of E\ 

Definition 9. We say that S converges weakly to E if there 
exist an infinity of times Si such that 

K(E(Mq, 8i)\S(Mo, E(Mq, Si), Si)) < e. (7) 

As direct consequence of the inequality 5 and lemma 8 
we have the following lemma: 

Lemma 10. Let S (M 0 , E(Mq ,£),£) be a weakly converging 
system. Any decision algorithm D e (S , M 0 , E 1 , Sf) can only 
decide the first non-random time. 

As noted above, these results do not change when dy¬ 
namic environments are considered. In fact, we can think 
of static environments as a special case of dynamic environ¬ 
ments. However, with different targets of adaptability and 
convergence, it makes sense to generalize beyond the first 
convergence time. Also, it should be noticed that specify¬ 
ing a convergence index adds additional information that a 
decision algorithm can potentially use. 

Lemma 11. Let S(Mq, E(M 0 , t), t) be a weakly converg¬ 
ing system with an infinity of random times such that k > j 
implies that K{Sif) = K{Sj) + A K$(j,k), where A K§ 
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is a (not necessarily computable) function with range on 
the positive integers. If the function A K$(i,i + m) is 
unbounded with respect to i then any decision algorithm 
D e (S, M 0 , E,i), where i is the i-th convergence time, can 
only decide a finite number of i’s. 

One direct consequence of the previous lemma is that if a 
sequence of times 8 \, 82 ,..., Si,... is decidable then for every 
m there exist a constant c^ m such that 

A K$(i, i + 1) = K(5i+m) - K( 8 i) < c^ m . 

which we can be generalized as: 

Definition 12. Let 81 , 82 , ..., 81 ,... be a strictly growing 
sequence of natural numbers. We define the descriptive 
derivative of the natural mapping <5 : i 1 —^ as 

A K s (m) = min{c : \K(5 i+m ) - K(5i)\ <c,ie N}. (8) 

As a direct consequence of lemma 11, the existence of a 
descriptive derivative is a necessary condition for the com¬ 
putability of 8 ; thus not meeting this property is sufficient, 
but not necessary, for undecidability. Therefore the ex¬ 
istence of a descriptive derivative is a stronger condition 
which we will call non-descriptively differentiable. 

Definition 13. We say that a sequence of times 
is 81 , 82 ,..., Si,... is non-descriptively differentiable if 
A K$ (m) is not a total function. 

Irreducibility of Descriptive Time Complexity 

At the previous section, it was established that time was the 
main factor in the descriptive complexity of the states within 
the evolution of a system. This result is expanded by the 
time complexity stability theorem (14). This theorem es¬ 
tablishes that, within an algorithmic descriptive complexity 
framework, similarly complex initial states must evolve into 
similarly complex future states over similarly complex time 
frames, effectively erasing the difference between the com¬ 
plexity of the state of the system and the complexity of the 
corresponding time and establishing absolute limits to the 
reducibility of future states. 

Let F(t) = T(S(Mo, E,t)) be the real execution time 
of the system at time t. By using our time counting ma¬ 
chine U H it is easy to see that F(t) is computable and, by 
uniqueness of successor state, F increases strictly with t, 
and hence it is injective. Consequently, F has a compu¬ 
tational inverse F _1 over its image. Therefore, we have 
that (up to a small constant) K(F(t )) < K(F) + K(t ) 
and K(t) < K(F~ 1 ) + K(F(t)). Follows that, K(t ) = 
K(F(t)) + 0(c), where c is an integer independent of t (but 
that can depend on S ). In other words, for a fixed system S, 
the execution time and the system time are equally complex 
up to a constant. From here on I will not differentiate be¬ 
tween the complexity of both times. A generalization of the 
previous equation is given by the following theorem: 


Theorem 14 (Time Complexity Stability). Let S and S' be 
two computable systems and t and t' the first time where 
each system reaches the states M t and M[, respectively, 
then exist c such that \K(M t ) — K(t)\ < c and \K(M t ) — 
<c. 

Beyond Halting States: Open-Ended Evolution 

Inequality 5 states that being able to predict or recognize 
adaptation imposes a limit to the descriptive complexity of 
the first adapted state. A particular case is the halting state, 
as shown at the proof of lemma 8. By lemma 3 this result 
holds for any complexity measure. In this section we ex¬ 
tend the lemma to continuously evolving systems, showing 
that computability of adapted times limits the complexity of 
adapted states beyond the first, imposing a limit to open- 
ended evolution for three complexity measures: sophistica¬ 
tion, coarse sophistication and busy beaver logical depth. 

For a system in constant evolution converging to a dy¬ 
namic environment, the lemma 11 imposes a limit to the 
growth of descriptive complexity of a system with com¬ 
putable adapted states: if the growth of the descriptive com¬ 
plexity of a sequence of convergent times is unbounded in the 
sense of definition 13 then all but a finite number of times are 
undecidable. The converse would be convenient, however it 
is not always true. Moreover, the next series of result shows 
that imposing such limit would impede strong OEE: 

Theorem 15. Let S be a non cyclical computable sys¬ 
tem with initial state Mq, E a dynamic environment and 
81 ,..., Si,... a sequence of times such that for each Si there 
exist a total function pi such that pi(M$.) = Eifif). If 
the function p : i pi is computable, then the function 
8 : i 1 —>> Si is computable. 

The last result can be applied naturally to weakly conver¬ 
gent systems (9): the way each adapted state approaches to 
E is unpredictable, in other words, its behaviour changes 
over different stages unpredictably. Formally: 

Corollary 16. Let S(Mq, E, t ) be a weakly converging sys¬ 
tem with adapted states M$ 1 ,..., Ms i ,... and p\ ,..., pi ,... 
their respective adapted behaviour. If the mapping 8 : 
i 1 —Si is non-descriptively differentiable then the function 
p : i 1 —pi is not computable. 

While asking for totality might look like an arbitrary lim¬ 
itation at first glance, the reader should recall that in weakly 
convergent systems the program pi represents an organism, a 
theory or other computable system that uses M$. ’s informa¬ 
tion to predict the behaviour of E(8i ), and if this prediction 
does not process its environment in a sensible time frame 
then it is hard to argue that it represents an adapted system 
or an useful theory. 

The intuition behind classifying descriptively differen¬ 
tiable adapted time sequences as less complex is better ex¬ 
plained by borrowing ideas developed by Bennett and Kop- 
pel within the framework of logical depth (Bennett, 1988) 
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and sophistication (Koppel, 1988), respectively. Their argu¬ 
ment states that random strings are as simple as very regular 
strings, given that there is no complex underlying structure 
in their minimal descriptions. The intuition that random ob¬ 
jects contain no useful information leads us to the same con¬ 
clusion. And thanks to theorem 14, the states must retain a 
high degree of randomness for random times. 

Logical depth works under the assumption that complex 
or deep natural numbers take a long time to compute from 
near minimal descriptions. Conversely, random or incom¬ 
pressible strings are shallow since their minimal descriptions 
must contain the full description verbatim. For this work we 
will use a variation of logical depth called busy beaver logi¬ 
cal depth, denoted by depths (x). 

Sophistication is a measure of useful information within 
a string proposed by Koppel. The idea behind is to divide 
the description of a string x in two parts: the program that 
represents the underlying structure of the object and the in¬ 
put, which is the random or structureless component of the 
object. This function is denoted by soph c (x ), where c is a 
natural number representing the significance level. 

Now, the images of a mapping S : i Si already have the 
form S(i), where S and i represent the structure and the ran¬ 
dom component respectively. Random strings should hold 
strongly this structure up to a logarithmic error, which is 
proven in the next lemma. 

Lemma 17. Let Si, ...,Si ,... be a sequence of different natu¬ 
ral numbers and r a natural number. If the function S : i 
Si is computable then there exists an infinite subsequence 
where the sophistication is bounded up to a logarithm of a 
logarithmic term of their indexes. 

Now, small changes in the significance level of sophistica¬ 
tion can have a large impact on the sophistication of a given 
string. Another possible issue is that the constant proposed 
at lemma 17 could appear to be large at first (but it becomes 
comparatively smaller as i grows). A robust variation of so¬ 
phistication called coarse sophistication (Antunes and Fort- 
now, 2003), incorporates the significance level as a penalty. 
The definition presented here differs slightly from theirs in 
order to maintain congruence with the chosen prefix-free 
universal machine and to avoid negative values. This mea¬ 
sure is denoted by csoph(x). 

With a similar argument as the one used to prove lemma 
17, it is easy to show that coarse sophistication is similarly 
bounded up to a logarithm of a logarithmic term. 

Lemma 18. Let Si,..., Si,... be a sequence of different nat¬ 
ural numbers and r a natural number. If the function 
S : i Si is computable then there exist an infinite sub¬ 
sequence where the coarse sophistication is bounded up to 
a logarithm of a logarithmic term. 

Another proposed measure of complexity is Bennett’s 
logical depth, for which the next result follows from a theo¬ 
rem found by Antunes and Fortnow (Antunes and Fortnow, 


2003) and lemma 18. 

Corollary 19. Let Si, ...,Si,... be a sequence of different 
natural numbers and r a natural number. If the function 
S : i i —y Si is computable then there exist an infinite subse¬ 
quence where the busy beaver logical depth is bounded up 
to a logarithm of a logarithmic term of their indexes. 

Let us focus on the consequence of lemmas 17 and 18 
and corollary 19. Given the relationship established between 
descriptive time complexity and the corresponding state of 
a system (theorem 14), these last results imply that either 
the complexity of the adapted states of a system (using any 
of the three complexity measures) grows very slowly for an 
infinity subsequence of times (becoming increasingly com¬ 
mon up to probability of 1 (Calude and Stay, 2006)) or the 
subsequence of adapted times is undecidable. Formally: 

Theorem 20. If S (M 0 , E, t) is a weakly converging system 
with adaptation times Si,..., Si,... then there exist a constant 
c such that, if csoph, depths or soph c show strong OEE 
that grows faster than 0( log log i) then the mapping S : i i—> 
Si is not computable. 

Technically theorem 20 does not impose undecidability to 
strong OEE. However, the growing rate that decidability im¬ 
poses is extremely slow. If we disregard this increasingly in¬ 
significant growing rate, we can say that strong open-ended 
evolution implies undecidability of the adapted states. 

Furthermore, by theorem 16, the behaviour and interpre¬ 
tation of the system evolves in an unpredictable way, estab¬ 
lishing one path for emergence : a set of rules that cannot 
be reduced to an initial set of rules. Recall that for a given 
weakly converging dynamical system, the sequence of pro¬ 
grams pi represents the behaviour or interpretation of each 
adapted state Mi. If a system exhibits strong OEE with re¬ 
spect to the complexity measures soph c , csoph or depths, 
by corollary 16 and theorem 20 the sequence of behaviours 
is uncomputable and therefore, irreducible to any function 
of the form p : i i— p^, even when possessing complete de¬ 
scriptions for the behaviour of the system, its environment 
and its initial state. 

Conclusions 

We have presented a formal and general mathematical model 
for adaptation within the framework of computable dynami¬ 
cal systems. This model exhibits universal properties for all 
computable dynamical systems, of which Turing machines 
are a subset. 

Among other results, we have given formal definitions of 
open-ended evolution (OEE) and strong open-ended evolu¬ 
tion. We also showed that decidability imposes universal 
limits to the growth of complexity of computable systems as 
measured by sophistication, coarse sophistication and busy 
beaver logical depth. Furthermore, as a direct implication 
of corollary 16 and theorem 20, undecidability of adapted 
states and unpredictability of the behaviour of the system 
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at each state is a requirement for a system to exhibit strong 
open-ended evolution (up to a O (log log t) term) with re¬ 
spect to the complexity measures known as sophistication, 
coarse sophistication and busy beaver logical depth, estab¬ 
lishing a rigorous proof that undecidability and irreducibility 
of future behaviour is a requirement for the growth of com¬ 
plexity among the class of computable dynamical systems. 
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Abstract 

Because the kind of open-ended complexity explosion seen 
on Earth remains beyond the observed dynamics of current ar¬ 
tificial life worlds, it has become critical to isolate and inves¬ 
tigate specific factors that may contribute to open-endedness. 
This paper focuses on one such factor that has previously re¬ 
ceived little attention in research on open-endedness: the min¬ 
imal criterion (MC) for reproduction. Originally proposed 
as an enhancement to novelty search, the MC is in effect a 
different abstraction of evolution than the more conventional 
competition-focused fitness-based paradigm, instead focus¬ 
ing on the minimal task that must be completed for an or¬ 
ganism to be allowed to produce offspring. The MC is inter¬ 
esting for studying open-endedness because in principle its 
strictness (i.e. how hard it is to satisfy) can be varied on a 
continuum to observe its effects. While in many artificial life 
worlds the MC strictness is implicit and therefore difficult to 
vary systematically, in the previously-introduced Chromaria 
world, the MC is designed to be set explicitly by the experi¬ 
menter, making possible the systematic study of different lev¬ 
els of MC strictness in this paper. The main result, supported 
by visual, quantitative, and qualitative observations, is that 
the strictness of the MC can profoundly affect open-ended 
dynamics, ultimately deciding between complete stagnation 
(both with extreme strictness or complete relaxation) and or¬ 
derly divergence. This result offers a lesson of particular im¬ 
portance to worlds whose MCs are not explicit by exposing 
an area of sensitivity within open-ended systems that is easy 
to overlook because of its implicit nature. 

Introduction 

Artificial life (alife) offers a unique opportunity to replay 
the tape of life (Gould, 1989) and thereby empirically test 
hypotheses about phenomena observed in nature. In par¬ 
ticular, alife worlds offer an ideal platform for exploring 
and controlling for mechanisms of interest in the domain of 
open-ended evolution (OEE) (Channon and Damper, 2000; 
Lehman and Stanley, 2015; Miconi and Channon, 2005; 
Ofria and Wilke, 2004; Ray, 1992; Soros and Stanley, 2014; 
Spector et al., 2007; Yaeger, 1994). Though a universally 
satisfying definition of OEE has been elusive (Bedau et al., 
1998; Channon, 2003, 2006; Juric, 1994; Maley, 1999), 
open-ended systems should at minimum not stagnate or con¬ 
verge. Biological evolution is widely considered an open- 


ended process, exhibiting 3.8 billion years of diversification 
and increasing complexity that continues today (Miconi, 
2008). A major challenge to investigating this phenomenon 
scientifically is of course that experiments on computers 
must be tractable within much shorter timespans. Given that 
experimental models necessarily entail such unnatural con¬ 
straints on time (in addition to space), the need to induce a 
rate of evolution that achieves natural-seeming dynamics is 
paramount. However, little is yet known on how to press the 
delicate levers of evolution to control its speed and open- 
endedness. 

One such lever is the minimal criterion (MC) for repro¬ 
duction, which in effect means the minimal function an or¬ 
ganism must perform for its lineage to continue. As an ex¬ 
ample, on Earth the MC is in effect to survive and physi¬ 
cally create a copy of oneself. However, the MC on Earth 
is really only one of a vast spectrum of possibilities. In e.g. 
the recent Chromaria alife world (Soros and Stanley, 2014) 
the MC is simply to plant oneself somewhere that has col¬ 
ors similar to one’s own morphology (which then will cause 
an offspring to be created without any need for an evolved 
reproductive apparatus). This general concept of MC be¬ 
gan to appear in evolutionary computation literature in an 
algorithm called minimal criterion novelty search (MCNS) 
(Lehman and Stanley, 2010) as a low-level alternative ab¬ 
straction to fitness-centric evolution. Its first appearance in 
OEE was in Chromaria. 

The attractions of the MC for the purpose of developing 
a more fundamental understanding of OEE are threefold. 
First, it allows fitness, which is usually explicitly formalized 
as a score on a continuum, to become an implicit side effect 
of a deeper evolutionary principle. Second, by abstracting 
away the metric of fitness, it enables domains radically dif¬ 
ferent from Earth to be tested in the context of OEE while 
still maintaining a parallel with natural evolution. Third, the 
MC becomes just the kind of lever that can be calibrated to 
alter the overall dynamics of long-term evolution. That is, 
it is possible to alter its strictness. On Earth, this notion of 
strictness is implicit in the difficulty of surviving to the point 
of physically making a copy of oneself, but in other conceiv- 
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able worlds, the difficulty of satisfying the MC can be made 
explicit , allowing the experimenter literally to tune it pre¬ 
cisely and observe the consequence. Chromaria offers such 
an opportunity through its color-matching MC, the strictness 
of which can be tuned straightforwardly. 

Experiments in this paper will reveal that the strictness 
of the MC in OEE has a similar effect as selection pressure 
in traditional evolutionary algorithms - the more strict, the 
more convergent evolution tends to become. At the other end 
of the spectrum, the MC can allow the world to degrade into 
chaos when not strict enough. However, the strictness of the 
MC interestingly does not correspond to the proportion of 
the population that is allowed to reproduce; in fact the MC 
can be very strict even as many still succeed in satisfying 
it (and thereby reproducing). At the same time, when set 
too strictly it becomes a force for convergence and hence an 
obstacle to OEE. As results in this study will show, the MC is 
instrumental in striking the right balance between order and 
chaos - a sweet spot for open-endedness made accessible to 
investigation by the explicit MC in Chromaria. 

The implications of the results speak both to how to steer 
OEE towards desired dynamics and also to the delicacy of 
the balance that exists in open-ended systems such as nature 
where the MC is implicit. Stray too far from this golden 
balance, and the world can either stagnate or explode into 
meaningless chaos - a prescription worth considering for 
any future attempts to achieve OEE. 

Background 

To understand the significance of MC-based evolution, it is 
important first to understand the difference between repro¬ 
duction (the passive generation of an offspring, usually as a 
reward for good performance) and self-replication (the ac¬ 
tive assembly of a copy of oneself). On Earth these pro¬ 
cesses are conflated because biological self-replication is 
the sole means of reproduction. However, in alife worlds, 
an evolutionary algorithm can allow an individual to repro¬ 
duce even if the individual evolves no mechanism for self¬ 
replication. From an algorithmic point of view in such sys¬ 
tems, traversing genotype space is all that really matters. 
Abstracted in this way, the MC can be unfastened from 
the familiar imperative of self-replication because the MC 
can in principle be to achieve something other than self¬ 
replication, which would still then lead to reproduction. 

The idea of a MC distinct from self-replication first arose 
in the context of novelty search (Lehman and Stanley, 2011), 
an evolutionary algorithm that ignores fitness and instead 
searches only for behavioral novelty. While pure novelty 
search proved effective in closed domains with limited pos¬ 
sible behavioral trajectories (such as for robots in an en¬ 
closed maze), Lehman and Stanley (2010) observed that the 
method gets lost in spaces that encompass vast regions of de¬ 
generate behaviors (such as when robots can wander outside 
the maze). To address this problem, they proposed adding a 


MC that everyone in the population must satisfy by display¬ 
ing a minimal level of competence (such as staying within 
the maze) to be even considered for further selection based 
on novelty. In this way, filtering selection through a MC al¬ 
lows the population to stay focused on the interesting part of 
the search space. 

However, as emphasized by Lehman and Stanley (2010), 
the MC offers more than just practical utility. More signifi¬ 
cantly, it is an abstraction of a fundamental aspect of evolu¬ 
tion that is not captured conceptually by traditional fitness¬ 
centric models. The shift in focus from performance-based 
selection in nature to surviving long enough literally to man¬ 
ufacture a rough copy of oneself transforms the view of evo¬ 
lution from emphasizing optimization instead into a picture 
of a kind of tinkerer that continually discovers new ways 
to do the same thing (i.e. satisfy the MC), reminiscent of 
the concepts of adaptive radiation (Schluter, 2000) and neu¬ 
tral networks (Wagner, 2011). This perspective merits fur¬ 
ther investigation because it provides a potentially fruitful 
alternative view of natural evolution largely orthogonal to 
the usual competition-centric and fitness-inspired interpre¬ 
tations, thereby opening up new avenues for investigation. 

Of course, this alternate abstraction need not subsume the 
fitness-based view. Rather, it provides another angle for in¬ 
vestigating evolutionary mechanics as demonstrated by ex¬ 
periments with the recent Chromaria alife world. The orig¬ 
inal paper introducing Chromaria (Soros and Stanley, 2014) 
hypothesizes that OEE requires a non-trivial MC and that 
self-replication is just one of many possibilities for such 
non-triviality. Chromaria becomes a test for this hypothe¬ 
sis by driving evolution with an explicit MC (planting in a 
region of similar color to oneself) that is not self-replication. 
Moreover, because the threshold for satisfying this MC (i.e. 
the degree of similarity required to plant) can be varied, 
Chromaria provided for the first time a world in which the 
strictness of the MC can be isolated and directly manipulated 
to see its effects. Presumably, the difficulty of satisfying the 
MC would have a profound effect upon the consequent evo¬ 
lutionary dynamics. Yet no studies thus far have explored 
how to adjust the difficulty of satisfying the MC to strike 
just the right balance between order and chaos in open-ended 
alife worlds. The consequent lack of understanding is prob¬ 
lematic because, as the remainder of this section will demon¬ 
strate, the reproductive mechanisms in many existing alife 
worlds incorporate a form of MC even if it is not defined as 
such explicitly. 

For example, in Avida (Ofria and Wilke, 2004), indi¬ 
viduals are encoded as software programs and the MC, as 
on Earth, is to self-replicate. However, Avida addition¬ 
ally rewards individuals that evolve to perform certain com¬ 
putational tasks by allowing them to loop through their 
evolved programs (and thereby potentially reproduce) more 
frequently than others can. When an individual reproduces 
in this way, its offspring replaces a preexisting individual 
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Figure 1: Chromaria. The initial Chromarian is bom at 
the center of the world and then must find an appropriate 
place to plant. Subsequent individuals are born wherever 
their parents planted. The color-rich borders initially pro¬ 
vide the only viable options, but more emerge as Chromari- 
ans continue to thrive in the environment. 

in the population. Thus the MC of self-replication is made 
more difficult by introducing the threat of effectively random 
replacement. 

Poly world (Yaeger, 1994) also implements an Earth-like 
MC by removing individuals from the population if they fail 
to survive long enough to mate. In contrast with Avida, 
Poly world does not award fitness bonuses. However, Poly¬ 
world creatures are given a primitive fighting behavior that 
can be used to kill other creatures and thereby generate food. 
In this way, selection incentivizes the evolution of adversar¬ 
ial behaviors, which in turn increases the difficulty of meet¬ 
ing the survival-based MC. 

The reproductive mechanisms implemented in Avida and 
Poly world are typical of those in other alife worlds, which 
frequently include similar nature-inspired mechanisms that 
effectively increase the difficulty of satisfying the MC. How¬ 
ever, alife is not bound by nature’s constraints and thus ad¬ 
mits a variety of alternate MCs that can be more explicitly 
quantified and manipulated. As a result, alife opens up the 
opportunity to learn via experimentation the largely unex¬ 
plored implications of varying MC strictness. The next sec¬ 
tion describes the alife world of Chromaria, which serves 
as the domain in this paper for experiments varying the MC 
strictness. 

Chromaria setup 

Chromaria 1 (Soros and Stanley, 2014) is an alife world ex¬ 
plicitly designed for testing theories about the necessary 
conditions for OEE (figure 1). In deliberate contrast with 
many other alife worlds, Chromaria is designed without an 
explicit notion of competition or relative fitness. Instead, 
Chromarians qualify for reproduction (via an evolutionary 
algorithm) by satisfying the unique MC of navigating the 
world and finding an appropriate place to plant themselves. 

Each Chromarian’s morphology is a two-dimensional im¬ 
age composed of RGB pixels. To facilitate intelligent nav¬ 

1 Source code is available at http://github.com/lsoros/chromaria. 


igation of the world, each Chromarian is equipped with a 
rectangular visual field consisting of 100 RGB sensors cen¬ 
tered at the forefront of the Chromarian’s body. Half of 
these sensors fall underneath the body and the rest extend 
in front of the creature. The exact resolution of the vi¬ 
sual field depends on the creature’s morphology; as body 
length and width increase, the distance between neighbor¬ 
ing sensors grows. The visual field is complemented by a 
heading-sensitive compass consisting of 8 pie slice sensors. 
All sensor values are scaled and then input to an evolved 
multimodal neural controller (Figure 2). The output layer 
of this network has four effector nodes corresponding to the 
Chromarian’s rotation (L and R), speed (S), and inclination 
to plant itself (P). If the P node’s activation level exceeds a 
constant threshold, the Chromarian is immobilized and the 
function described in Figure 3 calculates a matching value 
quantifying how closely the creature’s morphology matches 
the colors already on the ground at the requested planting 
location. The attempt succeeds (and the MC is thereby con¬ 
sidered met) if the matching value exceeds a configurable 
threshold (r), which in effect thereby controls the strictness 
of the MC. Successful planting thus requires evolving a syn¬ 
ergistic combination of morphology and behavior. 



R (10x10) G (10x10) B (10x10) Heading 


Figure 2: Behavioral controller. All sensor values are 
scaled and then input to an evolved multimodal neural con¬ 
troller. Each plane represents an array of sensors or neurons. 
Arrows between planes in this schematic denote sets of con¬ 
nections between one plane and another. The input layer 
contains three individual color fields, and an additional set 
of heading inputs. The four output nodes control movement 
and planting behaviors (which enable the Chromarian to sat¬ 
isfy the MC). The maximum number of connections in this 
network (evolved by HyperNEAT) is 30,448. 

In the unconventional MC-driven main loop in Chro¬ 
maria, the Chromarians that have successfully planted most 
recently are kept in a parent queue capped at 100 individu¬ 
als. The parent queue initially contains only one initial seed 
Chromarian, which is found by preliminary search (Soros 
and Stanley, 2014) and thereby guaranteed to successfully 
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Figure 3: Boolean MC function in Chromaria. Each pixel 
of both the Chromarian’s morphology and sensor field (an 
area of ground overlapping the front of the Chromarian) 
is placed into one of eight bins: black, white, red, green, 
blue, yellow, cyan, or magenta. Here, a simple morphol¬ 
ogy is shown to the left of its binned equivalent and a his¬ 
togram of the bins. The bins are defined by halving the 
ranges [0,255] that the R, G, and B component values can 
take. For instance, any pixel with Re[0, (more non- 
red than red), Ge[0,^p] (more non-green than green), and 
Be[0,^p] (more non-blue than blue) falls into the black bin 
because black has values R,G,B = 0,0,0. Once every pixel 
is binned in this way, color ratios are calculated for each bin 
by dividing the bin size by the total pixel count. Ratios are 
recorded for both the morphology and sensor field. The dif¬ 
ferences between these ratios for each color are summed to 
get a matching value. If this value is less than a configurable 
threshold r (effectively the threshold for satisfying the MC), 
the function is satisfied. 



Figure 4: Morphology-encoding CPPN and decoding 
process. The CPPN on the left is decoded by iteratively acti¬ 
vating the network with pairs of polar coordinates r and 0 as 
input and using the resulting outputs to draw the morphology 
pixel by pixel. Upon activation, the CPPN returns an r max 
for each value of 6, which determines how far the perimeter 
of the Chromarian’s body extends at that angle. The CPPN is 
then activated again for every r on the interior of this border 
to get the corresponding RGB values. In this way the CPPN 
determines both the shape (via the r ma:E output) and inter¬ 
nal color (via the RGB outputs) of the Chromarian. These 
characteristics ultimately determine where the Chromarian 
can successfully plant and thereby satisfy the MC. 


plant. The initial seed individual is born at the center of the 
world and navigates to plant on the color-rich border. Sub¬ 
sequent offspring are born on top of their parents and then 
must similarly move away from their spawn point to find an 
appropriate place to plant. To prevent individuals who at¬ 
tempt trivially to plant without moving from reproducing, 
planting attempts made during the first 25 ticks of an in¬ 
dividual’s life are automatically invalidated. If an individual 
either makes an invalid planting attempt or fails to plant alto¬ 
gether, the Chromarian is removed from the world and does 
not generate any children. However, if a Chromarian suc¬ 
ceeds in moving to an appropriate location and then requests 
to plant, its body remains frozen and thereby supercedes all 
other pixels at that location. Its offspring is then generated 
(because anyone who satisfies the MC is given an offspring) 
based on the reproductive mechanics of HyperNEAT (Stan¬ 
ley et al., 2009). Accordingly, the genotype used in this sys¬ 
tem is a compositional pattern producing network (CPPN; 
Stanley 2007), an indirect encoding that generates patterns 
with regularities seen in nature such as symmetry, repeti¬ 
tion, and repetition with variation. Separate CPPNs are used 
to encode the morphology (Figure 4) and neural controller. 
Parameters for evolution are included with the source code. 

Note that reproductive dynamics in Chromaria are un¬ 
like those in many other alife worlds. To both allow the 
steady state evolutionary algorithm to explore multiple lin¬ 
eages simultaneously and to guarantee that all Chromari- 
ans who successfully plant eventually get to reproduce, off¬ 
spring who satisfy the MC are placed at the end of the parent 


queue and only reproduce once they reach the front of the 
queue. (The oldest parent in the queue is removed to make 
room for the new offspring if the queue is at maximum ca¬ 
pacity.) After reproducing, if it is not the oldest, the Chro¬ 
marian at the front of the queue is sent to the back again. 
Thus the individual active in the world is not always the 
child of the individual that was active just before it. Though 
potentially unintuitive, this process forces the system to al¬ 
low every preexisting member of the population to repro¬ 
duce before a newcomer. As a result, the explicit competi¬ 
tion so central to many alife worlds is intentionally absent 
from Chromaria. Note that if offspring of enough parents 
in the queue fail to plant, some parents may receive more 
than one opportunity to reproduce when they advance again 
to the front of the line. 

Chromaria is chosen as the domain for experiments in 
this paper precisely because it allows the MC to be studied 
in isolation from competition and other selective pressures 
(everyone who satisfies the MC is guaranteed an offspring). 
Additionally, the difficulty of satisfying the MC is easily ad¬ 
justed by simply setting a different r for the planting func¬ 
tion described in Figure 3. An interesting question that will 
be clarified by these experiments is what features of open- 
endedness can be achieved when more conventional com¬ 
petitive mechanisms are absent. However, the hope is that 
the results will inform the design of systems that do involve 
more intricate evolutionary pressures. 
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Experiment 

One aim of this study is to develop an intuition for how the 
MC affects the balance of order and chaos in potentially 
open-ended evolutionary systems. For this reason, experi¬ 
mental runs are carried out through 350,000 reproductions, 
each lasting 5 to 7 days (wall clock time). Fifty runs were 
performed in total, each starting from the same initial seed 
and varying only r. Twenty baseline runs were performed 
with a moderate r of 87.5%, which is the same match¬ 
ing threshold implemented in previous experiments by Soros 
and Stanley (2014). Additionally, the MC was made more 
strict by increasing r to 95% and made more forgiving by 
decreasing it to 75%. Ten runs were performed with each of 
these thresholds. 

A related aim of this study is to investigate one of the hy¬ 
pothesized necessary conditions for OEE proposed by Soros 
and Stanley (2014). Though artificial life provides a promis¬ 
ing platform for answering the question of exactly what is 
necessary for OEE, surprisingly few empirical studies have 
been performed to test the validity of the few frameworks 
proposed thus far (i.e. by Waddington (1969) and Taylor 
(2004)). According to the hypothesis that the MC is nec¬ 
essary for OEE (Soros and Stanley, 2014), evolving systems 
should become degenerate if every individual is allowed to 
reproduce. This hypothesis in effect addresses the extreme 
case wherein the MC is so relaxed that there is no MC (the 
control). To this end, an additional set of ten control runs are 
performed wherein every individual generates an offspring 
regardless of what it does during its lifetime. Note that this 
control setup is slightly more complicated than simply set¬ 
ting r to 0% because planting attempts during the first 25 
ticks of an individual’s life are normally invalidated, but this 
restriction is removed in the control case. Individuals that 
never ask to plant also generate offspring (after timing out) 
in the control case, though only the bodies of Chromarians 
who actively ask to plant remain visible in the world after 
their lifetime ends. 

Results 

Chromaria’s visual design allows evolutionary dynamics to 
be observed in an intuitive, engaging manner. Harnessing 
this capability, sequences of screenshots up to 350,000 re¬ 
productions demonstrate typical effects of varying r to pro¬ 
duce different MC strictness levels (Figure 5). The story 
told by these sequences illustrates the profound effect of MC 
strictness on OEE. At one end of the spectrum, when the 
MC is strict, the world only slightly changes over hundreds 
of thousands of reproductions. Interestingly, the extreme 
opposite end of the spectrum, when there is no MC at all, 
while initially chaotic, also ends up descending into stagna¬ 
tion because the entire breeding population eventually loses 
the ability even to request to plant. It is only at the middle 
ranges that consistent and coherent change (both in terms of 
morphology and behavior) is readily observable. 


Of course, screenshots offer only a static portrait of life 
in Chromaria and cannot capture the diversity and com¬ 
plexity of individual Chromarians’ behaviors. For this 
reason, videos (and additional screenshots) are available 
at http://eplex.cs.ucf.edu/chromaria-resources/. Observing 
these dynamic worlds reveals that the most interesting be¬ 
haviors are expressed when the environment contains a va¬ 
riety of simple RGB signals. When the MC is strict, little 
deviation from the simple dichromatic initial seed is toler¬ 
ated, and as a result the world becomes filled with strong 
magenta, blue, and white signals. Individuals evolve behav¬ 
iors such as turning left when a magenta/blue edge appears 
and then planting when a patch of whiteness (such as at 
the frontier created by new planters) is encountered. When 
the MC is moderate, environmental complexity increases as 
a result of Chromarians with more complex morphologies 
successfully planting. However, the color-based signals in 
the environment become harder to differentiate and individ¬ 
uals begin to rely on heading (which provides a more reli¬ 
able signal) to affect planting behaviors. This phenomenon 
is readily observed in the later stages of the runs with mod¬ 
erate MC; though some individuals visibly change direction 
based on environmental signals, others ignore the RGB in¬ 
puts and simply move in smooth arcs and then plant once 
a desired heading has been reached. Runs with the forgiv¬ 
ing MC exhibit the widest diversity of behaviors, reflecting 
the clear yet diverse RGB signals that emerge in the envi¬ 
ronment as individuals successfully plant. Finally, when the 
MC is completely absent, individuals tend to follow simple 
trajectories with no inherent purpose. Eventually, planting 
behaviors end up disappearing completely in this case, and 
senselessly crashing into walls becomes a popular behavior. 
Recall that the second goal of this experiment is to test the 
hypothesis that a MC is necessary for OEE. If this hypoth¬ 
esis is true, then evolution should stagnate when the MC is 
absent. In fact such stagnation (as depicted in Figure 5) does 
occur in every control run. 

The qualitative view is strengthened by complementary 
quantitative analysis. Of course, the quantification of open- 
endedness is a contentious topic, reflecting the general lack 
of consensus regarding the phenomenon’s salient features. 
Activity statistics (Bedau et al., 1998; Channon, 2003), 
which measure the persistence of advantageous genotypes 
over the course of evolution, present one possibility for 
quantification. However, alternate metrics are devised for 
the experiments in this paper so that the control runs (which 
completely eliminate genotypic advantage by giving every 
individual an offspring) can be included in comparisons. 

The first metric, which is an application of the graph the¬ 
oretic notion of connected components (Hopcroft and Tar- 
jan, 1973), interprets the RGB phenotype space as a three- 
dimensional cube-shaped graph (Figure 6). Every 250 repro¬ 
ductions, each individual in the parent queue is placed into 
one of 216 bins based on the R, G, and B ratios in its mor- 
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Figure 5: Representative worlds over time. Worlds with different MC strictness are depicted after the specified numbers of 
reproductions have occurred. The variable r quantifies the difficulty of the MC (detailed in the Experiment section). When the 
MC is strict, evolution exhibits little progression and the breeding population lacks diversity (an undesirable result for OEE). As 
the MC is relaxed, diversity increases but stability eventually plunges into chaos (eventually leading back to stagnation). Videos 
demonstrating the complexity of behavior in additional runs are available at http://eplex.cs.ucf.edu/chromaria-resources/. 


phology. The individual bins can then be treated as nodes in 
a graph, with connections existing between adjacent nodes. 
Evolutionary divergence (or lack thereof) can then be ap¬ 
proximated by the average number of disjoint components 
(< dc ) in the RGB cube at any given time in the run. Inter¬ 
estingly, the minimal value of 1 therefore jointly indicates 
that evolution is either maximally ordered (when few bins 
are filled) or maximally chaotic (when all bins are filled). 
Between these extremes, dc increases as evolution diverges. 
However, this metric (which indicates divergence only every 
250 reproductions) does not tell the full story because it does 
not reveal how much of the space was eventually explored 
over the entire run. For this reason, the average total num¬ 
ber of bins filled over the course of an entire run ( bf ) is also 
recorded for each MC level. 

Figure 7 depicts dc over time for each run. As expected, 
when the MC is strict (r = 95%) the average number of dis¬ 
joint components and number of bins filled per run are both 
low (dc = 1.34, a = 0.59; bf = 51/216), indicating limited 
evolutionary divergence. Interestingly, when the MC is re¬ 
laxed to the more moderate baseline value (r = 87.5%), dc 
increases only slightly to 2.24 (a = 0.42), but bf sharply 



Figure 6: Discretized RGB space. Every Chromarian in the 
parent queue is placed into 1 of 216 bins based on the R, G, 
and B ratios in its morphology. The number of connected 
components in the color space approximates the amount of 
correlation in the parent queue. When all bins are filled (as 
is the case when the system is completely chaotic) as shown 
above, the number of connected components is 1. It is also 
1 when the system is completely converged. 
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Figure 7: Average number of disjoint components in the 
parent queue over time. A minimal value of 1 indicates 
that evolution is either maximally ordered (when few bins 
are filled) or maximally chaotic (when all bins are filled). 
However, high values may also indicate a tendency towards 
chaos. For this experiment, samples are taken every 250 
reproductions. Student’s t-tests with p-values p < 0.05 in¬ 
dicate significant differences between strict versus moder¬ 
ate runs for 97.21% of parent queue samples, for 100% of 
the samples between moderate versus forgiving runs, and for 
98.64% of samples between forgiving and control (no MC) 
runs. 
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Figure 8: Average planting success rates over time. The 

number of successful planters increases steadily and even¬ 
tually approximately doubles over the course of the non¬ 
control runs, indicating that evolution reliably discovers 
nontrivial behaviors. However, when there is no MC (the 
control) evolution degenerates so rapidly into triviality that 
evolution halts altogether. This result confirms the hypothe¬ 
sis that a minimal criterion is necessary for OEE. 


increases to 178.5/216 (a = 17.56), signifying a similar de¬ 
gree of order as for the strict MC, but much faster traversal 
of phenotype space over time. As the MC becomes easier 
to satisfy, the runs display an increased tendency towards 
divergence and chaos. In the forgiving case (r=75%), dc in¬ 
creases dramatically to 8.59 (cr = 0.97) and eventually fills 


all 216 bins. The dc measure remains high (12.16, a = 0.79) 
and all bins are still filled when the MC is completely ab¬ 
sent, but recall that the world nevertheless actually eventu¬ 
ally stops changing entirely without the MC because of the 
eventual loss of the ability to plant. 

The catastrophic effects of removing the MC are clearly 
reflected also in the immediate and inevitable decline in 
planting success rates (Figure 8). In contrast, when there 
is an MC of any strictness, planting success rates steadily 
increase over time, suggesting a nontrivial evolutionary pro¬ 
cess leading to gradual behavioral improvements. 

Discussion 

Combining the number of connected components with the 
overall number of bins filled paints a quantitative picture of 
evolution’s trajectory. When the MC is too strict, evolution 
proceeds so slowly as to become stagnant. Unsurprisingly, 
there is little variation between runs in terms of color space 
coverage because search does not progress far beyond the 
initial magenta and blue subspace. The MC in this case en¬ 
courages genetic homogeneity and leads to an effective equi¬ 
librium. At the other extreme, when the minimal criterion is 
absent (the control case), the system initially exists so far 
from equilibrium that it verges on chaos. While such a sys¬ 
tem may avoid attractors and thus maximize coverage of the 
search space, it may also lack the qualities of cohesion and 
correlation that differentiate evolution from random search. 
Most problematic, though, is the proliferation of degenerate 
behaviors because of the lack of pressure to satisfy the MC. 
The lack of new planters even in the control when r is 0 
(evidenced by eventual planting success rates of 0) indicates 
that nobody is requesting to plant even though any such re¬ 
quest would be granted regardless of the colors in the world. 
This result may at first seem surprising, but in fact strongly 
supports the hypothesis that maintaining a minimal level of 
behavioral complexity is necessary for OEE. 

These experiments illustrate the complex manner in 
which the strictness of the explicit MC impacts evolution 
in Chromaria. However, the results are important more 
broadly for their potential to generalize to any open-ended 
alife world that limits which individuals are allowed to re¬ 
produce. In other words, modifying the intrinsic strictness 
of the implicit MC in other worlds may have similarly dra¬ 
matic implications for their degree of open-endedness. This 
issue is nontrivial because the MC prunes the space being 
explored by evolution. If there were no MC on Earth, for 
example, every organism would reproduce regardless of its 
viability. As a result, degeneracy would thrive whereas oth¬ 
erwise it could not. If the MC were too difficult (in this case, 
anything requiring functionality beyond that of a single cell), 
foundational building blocks would be discarded. Thus set¬ 
ting the wrong MC can have disastrous implications for a 
system’s ability to uncover the maximal amount of interest¬ 
ingness in a search space. While a designer may be able to 
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compensate for an overly strict MC by increasing run time 
if the run time is capped, it may be desirable to relax the 
MC through whatever means available to increase coverage 
of the search space in the allotted time. 

An emphasis on optimization in evolutionary computation 
has influenced the primacy of competition in the design of 
alife worlds. Unlike competition-based selection, the MC 
admits innovations that offer no discernible benefit for an 
individual (which would otherwise have been discarded by 
optimization-oriented evolution). For this reason, the MC 
allows evolution to maximize its potential as a creative force. 

However, competition is not the only overly restrictive 
mechanism. Any selective force that arbitrarily imposes a 
strict limit on reproduction (i.e. by only allowing a certain 
percentage of the population to generate offspring) will suf¬ 
fer the same inhibited creative output, including on Earth. 
Many mechanisms implemented in artificial life systems can 
be reduced to a MC-based interpretation in this way. By ab¬ 
stracting away the system-specific details and working at the 
most general level, we can develop a model of open-ended 
evolution with maximum applicability. 

Conclusion 

Life strikes a delicate balance between order and chaos. 
In alife worlds, this balance is moderated at least in part 
by an often-implicitly-implemented minimal criterion (MC). 
As results in this paper have shown, tuning this mechanism 
incorrectly can have disastrous implications for a system’s 
open-endedness. However, if configured correctly, the MC 
can create a dynamic environment in which evolved life 
flourishes. 
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Abstract 

The importance of environmental fluctuations in the evolu¬ 
tion of living organisms by natural selection has been widely 
noted by biologists and linked to many important characteris¬ 
tics of life such as modularity, plasticity, genotype size, muta¬ 
tion rate, learning, or epigenetic adaptations. In artificial-life 
simulations, however, environmental fluctuations are usually 
seen as a nuisance rather than an essential characteristic of 
evolution. HetCA is a heterogeneous cellular automata char¬ 
acterized by its ability to generate open-ended long-term evo¬ 
lution and “evolutionary progress”. In this paper, we pro¬ 
pose to measure the impact of different types of environmen¬ 
tal fluctuations in HetCA. Our results indicate that environ¬ 
mental changes induce mechanisms analogous to epigenetic 
adaptation or multilevel selection. This is particularly preva¬ 
lent in two of the tested fluctuation schemes, which involve a 
round-robin inhibition of certain cell types, where phenotypic 
selection seems to occur. 

Introduction 

In natural evolution, environmental changes may include 
cyclic events such as seasonal changes and the daily alter¬ 
nation of light and darkness, occasional changes such as the 
appearance of new predators and the potential for new food 
sources, or more radical modifications such as environmen¬ 
tal stresses induced by climate transitions. 

Since the work of Levins (1968) on evolution in changing 
environments and more recent attempts to integrate “epige¬ 
netic inheritance systems” (EIS) (Heard and Martienssen, 
2014), developmental or “evo-devo” processes (Muller, 
2007) and “niche construction” effects (Laland et al., 2016), 
emphasis has been put on the importance and impact of 
changes on the course of the evolution of living organisms 
by natural selection. Through such works, environmental 
fluctuations have been linked to many central properties and 
mechanisms of evolution, among which the most discussed 
examples are modularity, plasticity (West-Eberhard, 2005), 
genome size, mutation rates, and evolvability. 

For Jablonka et al. (2014), changing environments un¬ 
mask variations in the capacity of individuals to make ad¬ 
justments to new conditions, therefore promoting plasticity 
and multilevel selection. These authors contend that “For 


a lineage in a constantly changing environment, switching 
among several alternative heritable states was probably an 
advantage. While cells in one state survived in one set of 
conditions, those in other states did better in different cir¬ 
cumstances.” {ibid ., p. 318). In the same line of thought, 
continually varying or cyclic conditions might also explain 
the origin of EIS, as “epigenetic mutations” are more re¬ 
versible and occur more frequently than genetic ones. To 
illustrate this notion, Lachmann and Jablonka (1996) mod¬ 
eled the effects of oscillating variations, such as seasonal or 
daily cycles, on phenotypical inheritance. Their model pre¬ 
dicts correctly that when the environmental cycle is longer 
than the reproductive cycle, while remaining relatively short 
otherwise, heritable variations produced by non-DNA inher¬ 
itance systems are likely to be observed. 

In parallel to biological research, a number of studies in 
artificial life, especially evolutionary robotics (Floreano and 
Urzelai, 2000), have also investigated environmental varia¬ 
tions, some of them explicitly defining the environment as a 
driving evolutionary force (Bredeche and Montanier, 2012). 
Others, such as Lipson et al. (2002), showed a correlation 
between the modularity and the rate of change of external re¬ 
sources, while Yu (2007) observed that populations exploit 
neutrality to cope with environmental fluctuations and can 
evolve a type of evolvability under two alternating objective 
functions. Both of these simulation works relied on genetic 
programming (GP) and explicit fitness functions. 

We wish to study the effects of such fluctuations in a 
model closer to the living world without using an explicit 
objective function, and determine if these fluctuations pro¬ 
mote phenotypic selection over genotypic selection. To that 
goal, we propose an open-ended experimental setup allow¬ 
ing us to systematically and quantitatively measure the in¬ 
fluence of cyclic environmental fluctuations on the course of 
the evolution of cellular automata (CA). We show that such 
fluctuations lead to the emergence of processes similar to 
those exhibited by EIS. 

The paper is organized as follows. First the general mech¬ 
anisms of our Heterogeneous Cellular Automata (HetCA) 
model are explained. Then, the implementation of environ- 
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mental fluctuations in HetCA is described. Next, we specify 
the computational setup used to study environmental fluc¬ 
tuations. This is followed by a report on the experimental 
results and a discussion of their implications. Finally, we 
propose a qualitative analysis and a conclusion. 

The HetCA Model 

HetCA (Medernach et al., 2013) is based on classical two- 
dimensional CA with several additional features: cells fol¬ 
low a heterogeneous transition function, i.e. one which de¬ 
pends on their location, inspired by linear genetic program¬ 
ming (LGP); they can also fall into special decay and qui¬ 
escent states; and there is a notion of genetic transfer of 
transition functions (i.e. genotypes) between adjacent cells. 
Decay and quiescent cells do not possess a genotype; all 
other cells do, and are called living. There are 5 different 
living states: quiescent cells can acquire a genotype from 
any nearby living cell and therefore become living in turn; 
decay cells cannot, but become quiescent after a number of 
consecutive iterations comprised between 375 and 1,875 af¬ 
ter decay (Fig. 1). Living cells always automatically turn 
into decay after 7 consecutive iterations spent in any one or 
several living states (tracked by an “age” counter). Further 
details about the HetCA model can be found in (Medernach 
et al., 2013). 

We showed that HetCA could exhibit long-term pheno¬ 
typic dynamics, a high variance over very long runs, greater 
behavioral diversity than classical CA, and “evolutionary 
progress” (Shanahan, 2012) on three criteria: robustness, 
size and density of the genotype (Medernach et al., 2015). 

Finally, while there is a lasting debate over the units of se¬ 
lection in evolutionary biology since the origins of the field, 
including genotype selection, phenotype selection, epige¬ 
netic selection, behavioral selection, multilevel selection, 
group selection, and so on (Lloyd, 2012; Okasha, 2006), 
several of them are potentially included in HetCA. There 
is genotypic selection of the transition rules, but also phe¬ 
notypic selection of cell groups able to replicate patterns, 
such as the ones found in the Game of Life. This point is 
important when one is interested in environmental fluctua¬ 
tions because, as mentioned in the introduction, we antic¬ 
ipate that the existence of frequent environmental fluctua¬ 
tions will promote phenotypic selection over genotypic se¬ 
lection. 

Experimental Setup 

As in the previous version of HetCA (Medernach et al., 
2013), the genotype of an individual consists of its transition 
rules encoded in a custom CA-LGP program using the func¬ 
tion set listed in Table 1. Such a program maps the space of 
neighborhood states to a new cell state, while providing an 
evolvable representation framework based on an alphabet of 
elementary functions. Individual genotypes are modified by 
micro-mutations (change in one component of a statement) 


Decay state 



Table 1: Function set. 


operator 

action on inputs (x, y) 

abs 

\x\ 

plus 

x + y 

delta 

1, if \x — y\ < 1/10 4 ; 0 o.w. 

dist 

\x - y | 

inv 

1 — X 

inv2 

safeDiv(l, x) 

magPlus 

\x + y\ 

max 

max{cc, y} 

min 

min{cc, y} 

safeDiv 

x/y if \y\ > 1/10 4 ; 1 o.w. 

safePow 

x v , if defined; 1 o.w. 

thresh 

1 , if x > y\ 0 o.w. 

times 

xy 

zero 

1 , if \x\ < 1/10 4 ; 0 o.w. 


and macro-mutations (addition or removal of an entire state¬ 
ment) of the corresponding CA-LGP programs. 

Previously, the new genotype g of a living cell c during ge¬ 
netic transfer was chosen randomly among candidate geno¬ 
types according to a uniform distribution. To be candidate, 
a genotype had to come from a living cell in c’s immediate 
neighborhood (von Neumann). In the present study, to in¬ 
troduce environmental variations we vary the likelihood of 
propagation of a genotype according to its cell state s(c). In 
this new setup, the probability P(c) of a candidate genotype 
to be selected becomes P(c ) = K(s(c))/Y^i =i ^( 5 (c))> 
where K(s) is the state distribution and n the number of 
neighboring candidate genotypes, i.e. 4. Thus, an environ¬ 
ment E is characterized by the propagation probabilities of 
the 5 possible living states: E = {iT(si),..., K(s$)}. To 
mimic environmental fluctuations we initialize the simula¬ 
tion with K(sk) = 1 for all k E [1,5], then regularly modify 
those values every / iterations starting from iteration 3,000 
of the CA. We introduce three types of environmental fluc¬ 
tuations (Table 2): 

Short-cycle fluctuations (ScF) consist of alternating 
between two opposite environments, {0,0,1,1,1} and 
{1,1,1, 0, 0}, every / iterations of the CA. We set / = 100 
to remain within the range of frequency described by Lipson 
et al. (2002) and Yu (2007). Here we consider that a success¬ 
ful reproductive cycle for a cell involves passing through the 
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Table 2: Stable and fluctuating environments. 


Name 

Short Name 

Cycles 

Transitions 

Environment list: propagation probabilities E = { K(s\), K (s 5)} of the living types 

Stable environment 
Short-cycle fluctuations 
Light fluctuations 

Strong fluctuations 

(SE) 

(ScF) 

(LF) 

(SF) 

NA 

100 

5,000 

5,000 

NA 

1 

1 

1 

{1,1, 1,1,1} 

{0,0, 1, 1, 1}, {1, 1,1,0,0} 

{1,1,1,1,0}, {1,1,1,0,1}, {1,1,0,1,1}, {1,0,1,1,1}, {0, i, 1,1,1}, {1,1,1,1,1} 

{0, 0, 1, 1, 1}, {1, 1, 1, 0, 0}, {0, 1, 0, 1, 1}, {1, 0, 1, 1, 0}, {0, 1, 1, 0, 1}, {1, 1, 0, 1, 0}, 
{1,0, 1,0, 1}, {0, 1, 1, 1,0}, {1,0,0, 1, 1}, {1,1,0,0,1}, {1, 1, 1, 1, 1} 


Table 3: HetCA parameters. 


Parameter 

Value 

Number of living states 

5 

Successive living iterations before decay 

7 

Number of iterations during decay 

375 to 1,875 

Direct transition to decay 

enabled 

Size of the grid 

500 x 500 

Grid boundaries 

toroidal grid 

Transition Rule (TR) 

CA-LGP 

Maximum TR size 

50 statmts 

Genotype copy neighborhood 

v. Neum. (4) 

Transition rule neighborhood 

Moore (8) 


quiescent state. This should take between 2 iterations (alter¬ 
nating between quiescent and living) and 7 iterations (after 
which a living cell decays and can no longer receive a geno¬ 
type for a long period of time). 

Light fluctuations (LF) consist of alternating between 6 
different environments every / = 5,000 iterations. The first 
5 environments each prohibit a different living state from 
spreading its genotype; the last one gives an equal chance to 
all living states. 

Strong fluctuations (SF) consist of alternating between 11 
environments every / = 5,000 iterations. The first 10 envi¬ 
ronments each prohibit a different pair of living states from 
spreading their genotypes; the last one gives an equal chance 
to all pairs. 

The rationale behind ScF is their analogy with circadian 
rhythms in certain bacteria. The idea is to mimic the highly 
regular cycles during which these organisms have enough 
time to reproduce repeatedly. LF, by contrast, are more sim¬ 
ilar to seasonal fluctuations, while SF resemble ecological 
crises. However, owing to the variety of both biological 
temporal rhythms and reproductive cycles, the relevance of 
these analogies remains limited. 

Simulations 

For each one of the three types of environmental fluctua¬ 
tions and the stable (non-fluctuating) environment (SE) we 
performed 50 simulations. This produced a total of 4 x 50 = 
200 runs. Each cell of the CA was initialized in a random 
state, then each cell in one of the 5 living states was initial¬ 
ized with an individual randomly generated genotype. Each 
run lasted 500,000 iterations under the parameters listed in 
Table 3. 


Genotype Size 

We used the number of program statements n prog as a mea¬ 
sure of genotype size and computed the average size of all 
current genotypes of a run every 2,500 iterations. We then 
reported the average and standard error of the mean (SEM) 
among all 50 runs sharing the same settings. 

Phenotype Comparison 

If environmental changes led to the emergence of pheno¬ 
typic selections (similar to the EIS) using easily reversible 
phenotypic mutations (Jablonka et al., 2014), then pheno¬ 
types from different individuals of the same lineage ob¬ 
served while environmental conditions are similar should 
stay relatively close, even though individuals from their 
lineage evolved in other environmental conditions between 
these measures. By contrast, if the adaptation to each en¬ 
vironmental change was done exclusively through the selec¬ 
tion of classical, irreversible genotypic mutations, these phe¬ 
notypes should be quite different, despite the potential evo¬ 
lutionary convergence. We developed a metric to measure 
phenotypic proximity between two iterations of the CA. To 
do this we simply used the distributions of living cells over 
the living states. Thus the phenotypic difference a between 
two iterations t\ and t 2 was calculated as follows: 


5 

cr(h,t 2 ) = Y 

k=l 


N(s kl t 1 ) 
N(t i) 


N{s kl t 2 ) 

N(t 2 ) 


( 1 ) 


where N(s k ,t ) is the number of cells in living state s k at 
iteration t and N(t) = J2 k =i N(s k ,t) is the total number 
of living cells at t. 

Every 5,000 iterations of the CA we performed two phe¬ 
notypic comparisons between the current iteration t\ — t 
and an iteration in the past, t 2 = t — At. In one sce¬ 
nario, the temporal distance At was a multiple of the pe¬ 
riodicity /, so that we compared two similar environments: 
E{ti) = E{t 2 ). We chose At = 60,000 in the SE, ScF and 
LF cases and 55,000 in the SF case. In another scenario, 
we introduced an additional single-period shift such that we 
compared two dissimilar environments: E{tfl) ^ E(t 2 ) but 
E(ti) = E(t 2 + /). Here At was respectively equal to 
60,100, 65,000 and 60,000 in the ScF, LF and SF cases. 


Diversity 

We used the “true diversity index” of order two (Jost, 2006) 
to measure the phenotypic and genotypic diversity at every 
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iteration t of the CA: 


5 


2 D p (t) = i/]T 

k=1 


N 2 (s k ,t) 

N 2 (t) 


G(t) 

2 D g (t) = l/Y J 


9 = 1 


iV 2 (t) ’ 
( 2 ) 


where G(t) is the number of distinct genotypes at iteration t 
and N(g, t) is the number of cells sharing genotype g at iter¬ 
ation t. Note that N(t) = J2k=i N ( s k,t ) = Efii N (d,t). 


Homogeneous Test 

We also collected the most common genotype (MCG), 
i.e. the most frequently occurring one, starting at iteration 
2,500/ then 102,500 and again about every 100,000 steps 
until iteration 500,000, thus creating 6 sampling points. All 
MCGs collected in that way were exported and tested alone 
in homogeneous conditions, i.e. where all cells were initial¬ 
ized with that genotype and no mutations could occur. Since 
each collected MCG from any one of the four fluctuating en¬ 
vironments was cross-tested again in all four environments, 
we performed a total of4x 50 x6x4 = 4800 runs. The 
maximum duration of these runs was set to 60,000 iterations. 
Sometimes the genotype was not adapted to the environment 
and all living cells went extinct (i.e. turned into the decay or 
quiescent state) before the end of the simulation. We con¬ 
sidered a homogeneity test to be successful if living cells did 
not all go extinct before 60,000 iterations. In the Results sec¬ 
tion, we report the success rate of those simulations, along 
with statistics on the last iterations of the failed runs. 


Phenotypic Disturbance in the Homogeneous Test 

In HetCA, to survive on the long term genotypes must reg¬ 
ularly “release” cells by transforming them into quiescent 
cells without genotypes. This generates patterns and cy¬ 
cles easy to observe in homogeneous simulations where a 
single genotype is tested (Fig. 2a). It is also observable, 
although with greater difficulty, in standard heterogeneous 
HetCA simulations (Fig. 2b). This is why, to characterize 
phenotypes we found it useful to measure these cycles as 
well as their irregularities. At every iteration t > 8 of the 
homogeneous genotype test, we compared the sequence of 
states s(c,t) and s(c,t — 1) of each cell c to its anterior 
sequences during the previous 8 iterations of the CA. We 
assessed whether this sequence of two states was repeated 
and with what periodicity p = min{pG [2,7]}, i.e. such that 
(s(c, t), s(c , t— 1)) = (s(c, t—p ), s(c, t — p— 1)). We used 
a sequence of two states because if the genotype of a cell 
adopts a stable strategy, i.e. repeats a sequence of states, this 
sequence must contain a minimum of two states in order to 
ensure the survival of the genotype—the quiescent state and 
one of the living states. We chose to limit the comparison to 

! It was shown by Medernach et al. (2015) that during the initial 
iterations of HetCA the MCGs were unlikely to exhibit any viable 
survival strategy. 



Figure 2: Examples of 6-step survival strategies: (a) Genotype 
extracted from a HetCA simulation in a stable environment (SE) 
with random homogeneous initialization, (b) Genotype produced 
by short-cycle fluctuations (ScF). 


the 8 previous iterations in order to reduce the computational 
cost and because the limit of 7 consecutive live iterations 
before decay involves, for a successful regular phenotype, a 
maximum periodicity of 7 iterations for the quiescent state. 
We performed this measure only if there was at least one liv¬ 
ing state and no decay among the last two states. For each 
iteration t of the CA, we reported the phenotypic disturbance 


m = E 

p=l 


N(p,t) 

N(t) 


1 ) 

N(t - 1) 


( 3 ) 


where N(p, t ) is the total number of cell that had periodicity 
p at iteration t. This measure is rough but interesting be¬ 
cause, unlike phenotypic differences, it is not directly based 
on states and therefore is less likely to be correlated to a 
state’s probability of propagation. 


Results 

Genotype Size and Genotype Mutations 

In the evolution of genotype size under the 4 types of envi¬ 
ronmental fluctuations (Fig. 3), we notice that the imposed 
size limit of 50 program statements tends to blur the dif¬ 
ferences between the different scenarios since most simula¬ 
tions converge to this limit. Yet, ScF clearly restricts the size 
of the genotypes more severely than SE, LF and SF, while 
these other conditions do not appear to influence genotype 
size. This size reduction by ScF could be a way to increase 
the impact of genotypic mutations on the phenotype. This is 
because, even though the number of statements potentially 
affected by mutations in LGP increases proportionally with 
genotype size, hence could have a larger effect on the phe¬ 
notype, there can also be a “buffer effect” brought by infor¬ 
mation redundancy in longer genotypes, which would in fact 
stabilize the phenotype. Hence, mutations in more compact 
genomes might end up being more impactful 2 . 

Observing the amount of mutations separating the current 
MCG from individuals created during initialization (Fig. 4), 

2 This assessment was supported by an analysis of the effective 
length of the most common individuals (not reported here). 
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Figure 3: Evolution of genotype sizes (average d= SEM) in number 
of LGP statements. 



Figure 4: Evolution of number of ancestral mutations (average 
d= SEM) involved in producing the current MCG. 

it is not surprising to see that more mutations are selected 
for when environmental fluctuations are introduced. Their 
number also seem to depend more on the strength of these 
fluctuations (i.e. the contrast between two successive propa¬ 
gation probability patterns E) than on their periodicity. We 
also note that the proximity of SF and ScF indicates that 
their differences in size are not explained by differences in 
the number of selected mutations. 

Phenotypic Comparison 

In the phenotypic comparison a(t 1 ,^ 2 ) between dissimi¬ 
lar environments, i.e. at t\ and 1 2 = t\ — At such that 
E(ti) ^ E(t 2 ) but E(ti) = E(t 2 + /) (Fig. 5), we ob- 
serve that the impact of environmental fluctuations decreases 
quickly for ScF while it remains very high for other types 
of environmental fluctuations. The phenotypic difference of 
ScF also remains most of the time lower than the phenotypic 
differences of the SE. This suggests the selection of a single 
phenotype, robust in both environments. Looking at cr be¬ 



Figure 5: Evolution of phenotypic comparison function a (average 
± SEM) between dissimilar environments. 



Figure 6: Evolution of phenotypic comparison function a (average 
zb SEM) between similar environments. 


tween similar environments, i.e. such that E(ti) = Efo) 
(Fig. 6), we note that phenotypic differences in LF and SF 
are much lower than in Fig. 5. 

Phenotypic and Genotypic Diversity 

Figures 7 and 8 depict the average phenotypic and genotypic 
diversities, 2 D P and 2 D g . The generally low phenotypic di¬ 
versity of ScF suggests the existence in this configuration 
of a dominant phenotype, which remains rather stable over 
time, whereas the relatively high phenotypic diversity of LF 
and SF combined with their relatively low genotypic diver¬ 
sity might suggest the existence of strong phenotypic selec¬ 
tion, hence some form of plasticity. 

Success Rates of the Homogeneous Test 

Success rates of genotypes in different homogeneous simu¬ 
lations are reported in Fig. 9 using a normal approximation 
with a 95% confidence interval. The fact that SE offers the 
lowest challenge is not surprising. Similarly, the fact that SF 
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Figure 7: Evolution of phenotypic diversity (average d= SEM). 


is the least conducive to success is also expected. A com¬ 
parison of the levels of difficulty between LF and ScF is less 
clear, however, since ScF performs significantly better in its 
own settings while on the contrary all other tested fluctua¬ 
tions are slightly more efficient in LF. It is also noteworthy 
that individuals from LF and SF seem relatively robust in 
various environmental configurations, while those from ScF 
seem fragile outside the environmental conditions in which 
they evolved. Moreover, among genotypes collected from 
iteration 102,500, these same individuals are the only ones 
that do not reach a 100% survival ratio in a stable environ¬ 
ment. 

Ending Iteration 

Displaying the last iterations reached by living cells of 
homogeneous runs, with genotypes collected at iteration 
100,000 and 500,000 (Fig. 10), we see that the ScF genotype 
failures are concentrated around iteration 15,000 in LF ho¬ 
mogeneous tests and 25,000 in SF homogeneous tests. This 
corresponds for these two configurations to the first envi¬ 
ronment for which K(ss) = 0, whereas K(ss) = 1 for 
all distributions E in ScF. But we also see that some of 
the genotypes from ScF fail during the early iterations of 
the homogeneous test regardless of the environmental fluc¬ 
tuations, including SE and ScF. This might imply that the 
ecosystem resulting from the evolutionary history of the in¬ 
dividuals plays a key role in their ability to survive in ScF. 

Analysis 

Environmental Transitions 

Figure 11 visualizes the typical transitions between environ¬ 
ments in different homogeneous runs. To compute these 
transitions we averaged the phenotypic disturbance P over 
iterations [t — 40, t + 40] for all t > 5,000 and t E T r , where 
T r is the sequence of iterations of run r at which a transition 
between environments effectively occurred, i.e. for which 
E(t) ^ E(t + 1). Figure lid shows the average transitions 



Iterations x 10 3 

Figure 8: Evolution of genotypic diversity (average d= SEM). 

of ScF genotypes collected at the 6 aforementioned time 
steps {2,500, 102,500, ..., 500,000} and subjected to a ho¬ 
mogeneous ScF test. It shows that phenotypes correspond¬ 
ing to genotypes collected later in the evolutionary process 
are less sensitive to environmental fluctuations. Conversely, 
as reported in Fig. lib, the phenotypes of genotypes from 
SF keep the same high sensitivity regardless of the itera¬ 
tion at which they were collected. Finally, Fig. 11a and 11c 
compare the average transitions in homogeneous ScF and 
SF tests with genotypes collected at iteration 500,000 from 
the four different configurations. Again, it can be observed 
there that the phenotype of ScF is much more stable than 
the others in its original environment, but is at the same time 
very sensitive to transitions in SF. 

Phenotypic Diversity 

The phenotypic diversity measured in Fig. 7 can also be ob¬ 
served by visual inspection of the CA as displayed in the 
screenshots of Fig. 12. First, LF and SF are visibly different 
from ScF. These two groups diverge significantly in texture 
and also clearly differ from SE. Individuals from ScF seem 
to produce stable and robust phenotypes in any environment 
encountered within a ScF scheme. Their adaptations ap¬ 
pear to be essentially created by genotypic mutations. They 
are also very dependent on their original ecosystem, some¬ 
times distinctively so (Fig. 13), and as a consequence they 
are not robust in other types of fluctuations, where the effect 
of mutations is probably enhanced by the reduced size of the 
genotypes, as mentioned previously. By contrast, individu¬ 
als evolved within LF or SF have high phenotypic diversity, 
and it seems likely that phenotypic selection occurs despite 
their lower genotypic diversity. 

Conclusions 

The phenotypic selection observed in this series of sim¬ 
ulations evokes the multilevel selection model described 
by Jablonka et al. (2014). Among the three tested envi- 
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Figure 9: Evolution of success rate of genotypes (average =b SEM) in homogeneous tests. 


t=10° SE - 
t=5x10 5 SE - 
t=10 5 LF - 


t=5x10 s LF - 
t=10 5 SF - 
t=5x10 5 SF - 


Iterations x 10 3 

(a) SE 


t=10 s SE 
t=5x10 5 SE 
t=10 3 LF 
t=5x10® LF 
t=10 5 SF 
t=5x1 Q 5 SF 
t=in 5 RpF 


Iterations x 10 

(c) LF 



(b) SF 


(a) SF: genomes from t = 5 xlO 5 


(b) SF: genomes from SF 



(d) ScF 


Figure 10: Last iterations reached by living cells of homogeneous 
runs of genotypes from iterations 100,000 and 500,000. 
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(c) ScF: genomes from t = 5xlO 5 (d) ScF: genomes from ScF 

Figure 11: Phenotypic disturbance: Average transition between en¬ 
vironments in different types of homogeneous tests. 


ronmental fluctuations, the LF and SF simulations are dis¬ 
playing the greatest similarity with this model. In ScF, the 
best evolutionary strategy seems to involve small genotypes, 
which could be favored for their capacity to maximize the 
phenotypic impact of mutations. Furthermore, the inability 
of most ScF individuals to survive outside of the ecosys¬ 
tem resulting from their evolutionary history is reminiscent 
of the impossibility of saving species solely by preserving 
their DNA, as claimed in Jablonka et al. (2014): “You would 
have to reconstruct the community, and often these commu¬ 
nities are very old, with historical memories that are stored 
in their epigenetic and behavioral systems. These are part 
of their ‘identity,’ part of their stability. You cannot freeze 
these memories: they have to be maintained and transmit¬ 
ted through use, so you cannot reconstruct the communities 
from their component parts.” (ibid., p. 363). However, these 
experiments alone could not determine whether the main 
distinguishing feature between LF and SF on the one hand 


and ScF on the other hand was the duration of the environ¬ 
mental cycles or the number of environmental types. Further 
investigation is needed. 
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Abstract 

The field of evolved virtual creatures has been suspiciously 
stagnant in terms of complexification of evolved agents since 
its inception over two decades ago. Many researchers have 
proposed algorithmic improvements, but none have taken 
hold and greatly propelled the scalability of early works. This 
paper suggests a more fundamental problem with co-evolving 
both the morphology and control of virtual creatures simul¬ 
taneously - one cemented in the theory of embodied cogni¬ 
tion. We reproduce and explore in greater detail a previous 
finding in the literature: premature convergence of the mor¬ 
phology (compared to the convergence point of optimizing 
controllers), and discuss how this finding fits as a symptom 
of the proposed problem. We hope that this improved under¬ 
standing of the fundamental problem domain will open the 
door for further scalability of evolved agents, and note that 
early findings from our future work point in that direction. 

Introduction 

In 1994, Karl Sims’ seminal work on “Evolving Virtual 
Creatures” (Sims, 1994b) created a field of study by that 
name. This work featured simulated creatures that were able 
to optimize both their physical layout and their behavioral 
control strategies for such tasks as terrestrial locomotion, 
swimming, phototaxis, and competition (Sims, 1994a). 

The potential applications of virtual creatures extends be¬ 
yond their initial contribution to computer graphics and ani¬ 
mation, serving as a testbed for the co-optimization of brain- 
body systems in robotics. With the challenges of continually 
modifying the morphology of physical robots during the op¬ 
timization process, the field of Evolutionary Robotics often 
turns to virtual creatures to optimize morphologies (and their 
associated controllers) before physical robots are manufac¬ 
tured from the optimized designs (Lund et al., 1997; Funes 
and Pollack, 1998; Lipson and Pollack, 2000; Nolfi and Flo- 
reano, 2002; Doursat et al., 2012; Bongard, 2013). 

However, in the two decade lifetime of this field, there 
have been notable struggles in optimizing creatures, with 
a very limited ability to extend beyond Sims’ initial 
works (Geijtenbeek and Pronost, 2012) - despite significant 
increases in computing power. Many researchers have sug¬ 
gested hypotheses for the cause of this standstill, such as de¬ 


ficiencies in the search algorithms (Hornby, 2006; Lehman 
and Stanley, 2011; Mouret and Clune, 2015) or genetic en¬ 
codings (Hornby et al., 2001; Bongard and Pfeifer, 2003). It 
has also been suggested that the environments/tasks chosen 
are not complex (or morphologically dependent) enough to 
necessitate optimization of both the morphology and con¬ 
troller (Auerbach and Bongard, 2014; Cheney et al., 2015). 
But since we have yet to clearly surpass Sims’ work, each of 
these hypotheses must be approached with some skepticism. 

This work takes note of the particular difficulty in opti¬ 
mizing morphology (Joachimczak et al., 2016) and sets out 
with the intent of proposing a new hypothesis for the field’s 
current roadblock. Our hypothesis, unlike many before it, 
does not rely on more powerful or astute search algorithms 
to laboriously make our way through the rugged and harsh 
search space which make optimization of virtual creatures so 
difficult. Rather, we intend to use our understanding of the 
behavior of virtual creatures, specifically the theory of em¬ 
bodied cognition, to suggest a fundamental issue in the way 
that we frame the problem of optimization of virtual crea¬ 
tures - which in turn causes the search landscapes to present 
such an unpleasant terrain. 

The theory of embodied cognition suggests that a funda¬ 
mental part of the cognitive control process of an individual 
is being situated (Wilson, 2002). It suggests that the dy¬ 
namic interactions between a reactive agent and the environ¬ 
ment, through sensory-motor feedback loops, are an impor¬ 
tant driver of behavior (Brooks, 1991), as opposed to cog¬ 
nitivism - the hypothesis that the central functions of mind 
can be accounted for in terms of the manipulation of sym¬ 
bols according to explicit rules (Anderson, 2003). 

This line of reasoning puts an extra emphasis on the mor¬ 
phology of an individual, as it acts as the lens and modulator 
for all physical communication between that individual’s in¬ 
ternal controller and the outside environment (Pfeifer and 
Bongard, 2006). This work outlines the specific hypothesis 
that the body’s importance, afforded to it by its role as the 
connection between internal desires for action and the exter¬ 
nal consequences of them (as well as external events and the 
internal sensory observations of them), is understated. With- 
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out a well established and properly functioning communica¬ 
tion channel, the sensory information and motor commands 
of an individual are ineffective. 

From this supposition, we can create a testable hypoth¬ 
esis about the value of the established morphological com¬ 
munication channel. Specifically, control optimization on 
an existing morphology can be more effective than morpho¬ 
logical optimization on a fixed controller - as the latter does 
not maintain an established communication framework from 
the controller to the environment (through the morphology). 
This results in a system which effectively causes large, un¬ 
intended variations in the behavior of the controller, as its 
physical interface is constantly being scrambled while opti¬ 
mization seeks to improve the physical shape of the body. 

In comparing each of these hypothetical situations to the 
current state of evolved virtual creatures, we will conclude 
by discussing a possible connection between this theory of 
embodied cognition and the lack of effective optimization. 
Our hope is that such evidence will shed additional light on 
(at least one of) the problem(s) facing our held, and arm us 
with the information to help tackle it in future works. 

Background 

The literature on failed attempts to co-optimize the mor¬ 
phology and control of virtual creatures is sparse. This 
may be due in part to the bias against publishing negative 
results (both in submission and acceptance of such find¬ 
ings) (Fanelli, 2011). However, informal conversation with 
members of the held acknowledge the lack of progress. We 
note the difficulty of optimizing morphologies in our own 
virtual creatures (Lipson and Pollack, 2000; Bongard and 
Pfeifer, 2003; Cheney et al., 2014, 2015) (and unpublished 
works), but find ourselves grasping for an understanding of 
why this may be the case. 

One clear and concise description of this very problem is 
expressed in Joachimczak et al. (2016), where they note: 

It can also be observed how during the first 100 gener¬ 
ations of the evolutionary run, morphological changes 
occurred very frequently. At generation 125, the over¬ 
all morphology of the best individual already resembles 
the best final individual found in the generation 1386 
(although its fitness is only 5.07, compared to the 11.15 
of the latter). The following generations bring multiple 
small changes to the morphology of adult form and al¬ 
most no changes to the larval form. Both stages, how¬ 
ever, undergo continuous modifications of their con¬ 
trollers, and it is these alterations that contribute the 
most to the improvements in fitness. This pattern was 
also observed in other evolutionary runs: the final 
morphology would emerge in the first few hundreds 
of generations and the remainder of the run would 
be spent on small tweaks to the bodies and optimiza¬ 
tion of controllers, (emphasis added) 


This notion of premature convergence of morphology is 
not a stand alone case. At times this premature convergence 
can be incorrectly interpreted as a positive trait, noted as di¬ 
versity of results (despite the lack of explicit diversity main¬ 
tenance), as in Cheney et al. (2013). 

In the remainder of this work we set out to reproduce 
the symptoms described in Joachimczak et al. (2016), where 
morphology converges prior to control. We seek to further 
examine and characterize this phenomenon, and describe a 
theoretical framework which may help to explain its cause. 

Methods 

Similarly to Joachimczak et al. (2016), we employ soft 
robots as our instantiation of evolved virtual creatures. We 
use 3D voxel-based soft robots, following from Cheney 
et al. (2013), but replace their discrete muscle types and 
synchronized contractions with voxels which allow individ¬ 
ualized phase offsets, consistent with the controllers used 
in Joachimczak et al. (2016). This allows for behaviors such 
as propagating waves, which were not possible in Cheney 
et al. (2013) (but were achieved in Joachimczak et al. (2016) 
and Cheney et al. (2014)). A global frequency of oscillations 
is also optimized. 

Dual-Network CPPN 

We genetically encode the soft robot phenotypes as a 
network, inspired by the CPPN-NEAT (Stanley, 2007), 
the algorithm employed by both Cheney et al. (2013) 
and Joachimczak et al. (2016) (though the later cleverly em¬ 
ploys the CPPN alongside development, rather than as an al¬ 
ternative to it). However, this work differs from those two by 
optimizing two separate networks, one containing only the 
outputs associated with the physical structure and material 
placement (“morphology”) of the creatures, while the sec¬ 
ond network produces only the outputs used to determine the 
actuation of the muscle voxels (“control”). This allows us to 
very clearly make variations to either the morphology or the 
controller, without affecting the genotype of the other 1 . 

To translate the CPPN genotype to a soft robot pheno¬ 
type, for each individual voxel in our 7x7x7 discretized 
design space, the “presence” output of the morphology net¬ 
work is queried. If the output value (which ah span the range 
[—1,1]) is positive, a voxel is placed there and the “mate¬ 
rial type” output is queried. If the “material type” output is 
positive as well, then a the voxel is an active “muscle” cell, 
otherwise, that voxel is a passive “tissue” cell. 

For each active muscle cell, the control network is 
queried, and the floating point value of the “phaseOffset” 
output (again from [—1,1]) is assigned as the relative phase 
offset of that muscle cell (where 0 is exactly in phase with a 
global clock, — 1 and 1 are synchronized a full phase ahead 
or behind it, and —0.5 and 0.5 are perfectly out of sync with 

ffioth source code and resulting data are available upon request 
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it). Finally, the frequency of this global “clock” oscillator is 
set using the mean value of the “frequency” output across all 
voxels (including those not currently expressed in the phe¬ 
notype). In order to easily allow the full range of possible 
frequencies to be expressed after averaging, a mean value of 
—0.5 or lower corresponds to the minimal frequency of 5Hz, 
while a mean value of 0.5 or higher corresponds to the max¬ 
imal frequency of 10Hz (with linear scaling between them), 
despite the continued [—1,1] range of each individual “fre¬ 
quency” output node. The optimization of the global oscil¬ 
lation speed is intended to allow the muscle actuations to 
resonate with the natural frequency of a given morphology. 

We should note that this encoding does allow for morpho¬ 
logical changes to affect the expressed control (as the addi¬ 
tion or removal of muscle cells will allow more or less of the 
underlying phase offset pattern to be expressed in the pheno¬ 
type). Due to the integrated and embodied nature of control, 
we believe that such an effect would happen with various 
definitions of “morphology” and “control” - such as a robot 
with 6 legs expressing a different number of joint control 
outputs than a 4 legged robot in the rigid body paradigm. 
This concept of morphology determining the expression of 
control may be less about this specific implementation and 
instead a more general consequence of embodied cognition 
in a situated creature (Pfeifer and Bongard, 2006). 

Physics Simulation in VoxCad 

Consistent with Cheney et al. (2013), we employ the open- 
source soft-body simulator VoxCad (Hiller and Lipson, 
2014) as the physics engine which determines the fitness of 
each creature’s phenotype. In order to normalize the number 
of actuations per muscle cell across creatures with different 
actuation frequencies, each individual is evaluated for ex¬ 
actly 20 actuation cycles (following a passive initialization 
period in which it is allowed to settle on the ground in a 
relaxed pose - intended to discourage passive falling strate¬ 
gies rather than active locomotion behaviors). This means 
that two creatures with different actuation frequencies will 
be simulated for different lengths of time. Following the 
termination of the simulation, the displacement of the crea¬ 
ture’s center-of-mass along the positive x axis is returned to 
the evolutionary algorithm. All other parameters regarding 
VoxCad simulation are taken from Cheney et al. (2013). 

Evolutionary Algorithm 

The optimization of these soft robots takes the form of an 
evolutionary algorithm. The genotype is a directed acyclic 
graph, represented in memory as a tree to allow an imple¬ 
mentation similar to that of genetic programming. Follow¬ 
ing from CPPNs (Stanley, 2007), each node in the graph 
sums its weighted inputs and feeds them through a series of 
nodes with geometric activation functions (here: sigmoid, 
sine, absolute value, negative absolute value, square, square 
root, or negative square root) to arrive at each of its output 


value(s). The inputs to this network are Cartesian ( x,y,z ) 
and polar (r) coordinates of the voxel in question, along with 
a bias node. The outputs are interpreted as described above. 

Variation and selection follow a (p/p + A) scheme of 
(50/25 + 25). Variations may be: the addition/removal of 
a node to a network, addition/removal of an edge between 
existing nodes, mutation of the weight associated with an 
edge, or mutation of a node’s activation function. Each of 
these variations occurred with equal probability, and each 
variation occurs to only one network of the phenotype, each 
with equal probability. Crossover was not considered in this 
work. Variations to the genotype were only considered valid 
if they resulted in a phenotypic change in the resulting soft 
robot. Variations were also disallowed if they resulted in 
creatures who occupied less than 10% of the available vox¬ 
els, or employed less than 5% of the available voxels as ac¬ 
tuated muscle cells. Selection was rank-based with elitism. 

Statistical Reporting 

All experimental data below represent the mean values of 
30 independent runs lasting for 5000 generations each. P- 
values are calculated using a Mann-Whitney rank-sum test, 
as we cannot assume normality of fitness values. Confidence 
intervals were plotted using bootstrapping of 10,000 sam¬ 
ples at the 95% confidence level. Significance values are 
marked with the following convention: ns for p > 0.05, 
* forp < 0.05, ** forp < 0.01, and *** forp < 0.001. 

Results 

First and foremost, we set out to replicate and examine the 
results founds in Joachimczak et al. (2016), where “the fi¬ 
nal morphology would emerge in the first few hundreds of 
generations and the remainder of the run would be spent on 
small tweaks to the bodies and optimization of controllers.” 

By visually inspecting the resulting creatures we find 
that this implementation appears able to reproduce the phe¬ 
nomenon. Fig. 1 shows the optimization over time of the 10 
best performing trials. Notice how conserved the morpholo¬ 
gies appear to be over time, with the gross morphology gen¬ 
erally emerging at or before the 100 generation mark (mid¬ 
dle column). While only the top 10 trials are shown for sake 
of space, this theme applies generally to all the runs. 

It is also interesting to note that the top two final-fitness- 
achieving runs were the only two to undergo a morphologi¬ 
cal change between generations 1000 and 5000 (the last two 
columns). This suggests that creatures to which search im¬ 
mediately converges upon are not optimal, and that better 
performing solutions may not be that far away in pheno¬ 
typic space (inferred from the similarities between the top 
two rows at generations 1000 and 5000), yet such creatures 
appear to be difficult for this search process to find (inferred 
by the lack of occurrence before generation 1000 in the top 
two runs, and at all in the next 8 runs). The idea that each 
run converges to a local, rather than global, optimum is also 
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Figure 1: Evolved morphologies at various stages in opti¬ 
mization (voxel color from red to green indicates phase off¬ 
set of controllers). Each row represents one of the top 10 run 
(out of 30, order by final fitness). Each column represents a 
point in time during optimization. Note that morphologies 
generally lock in before gen 100, often on simple forms. 

evident by the fact that the set of final creatures differ from 
one another, rather than converging to the same form. 

This visualization serves as an initial indication that the 
effect of early convergence is apparent in our setup, as it was 
in Joachimczak et al. (2016). However, it does not demon¬ 
strate that the effect of stagnation is more prominently fea¬ 
tured by morphology than controllers, or characterize just 
how detrimental such an effect may be. These two questions 
are both approached quantitatively in Figs. 2 and 3. 

To quantify how early the morphology converges and how 
detrimental this may be towards the optimization of virtual 
creatures, we artificially freeze the morphology after a given 
amount of time, and only allow control variations to occur 
after this point. If the resulting fitness value does not show a 
significant change following a morphology freeze at a given 
time (compared to optimizing both the morphology and con¬ 
trol for the entire optimization process), we can be confident 
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Figure 2: Fitness impact of freezing morphology at vari¬ 
ous points in optimization. Both morphology and control 
are optimized up to the freezing point. After it, only con¬ 
trol variations are considered for the reminder of the trial. 
The p-values (and significance markers) reported compare 
the resulting fitness to that achieved with co-optimization 
of both morphology and control for the full 5000 genera¬ 
tions. Note that morphologies optimized for 25 or more gen¬ 
erations show no significant fitness difference, compared to 
those optimized for all 5000 (noted above in bold). 



Figure 3: Fitness impact of freezing control at various points 
in optimization. Note that controllers with less than 250 gen¬ 
erations of optimization (but full morphological optimiza¬ 
tion) show no significant difference with those optimized 
for all 5000 gens, suggesting that control mutations continue 
to provide fitness benefits further into optimization than the 
morphology variations, which cease to be beneficial to fi¬ 
nal fitness values much earlier (cf. figure 2, generation 25 - 
please note the different x-axis compared to that Figure). 
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that the morphology did not significantly contribute to fit¬ 
ness improvements after that point in optimization time. 

Fig. 2 shows the fitness impact of morphology freezes 
at various times during optimization compared to co¬ 
optimization of morphology and control for the entire 5000 
generations. We see that full optimization does not show a 
significant fitness improvement compared to morphologies 
optimized for 25 generations or more (at the 95% confidence 
level, as p > 0.0604 for all freezing points > 25 gens). This 
means that the morphological variations after generation 25 
do not significantly contribute to the fitness of the resulting 
creatures, suggesting that morphology converges to (near) 
final forms by generation 25. The visual inspection of these 
creatures in Fig. 1 does not contradict such a suggestion. 

This does not mean that optimization as a whole is con¬ 
verged at this point. Improvements from control optimiza¬ 
tion occurring after the final gross morphology is fixed are 
noted in Joachimczak et al. (2016). We also see this effect 
here, with the fitness resulting from control optimization af¬ 
ter morphology freezing (26.520) significantly outperform¬ 
ing (p < 0.001) the fitness at the time of freezing (21.157). 

Fig. 3 shows the impact of the converse treatment, in 
which the creature’s controller is frozen at a given point in 
time and only morphological variations are allowed there¬ 
after. This treatment shows that significant differences in 
resulting fitness values occur for at least 100 generations (at 
the 95% confidence level, as p < 0.001 for freezing points 
< 100), but not more than 250 generations (p > 0.0565 
for freezing points > 250). The lack of significant differ¬ 
ence past 250 generations also points to early convergence 
of controllers to (near) final levels early in optimization. 

However, the significant drop in fitness from control 
freezing (at times past those when morphological change 
stops contributing to final fitness values) suggests that this 
example of virtual creature evolution creates earlier conver¬ 
gence for morphologies than it does for controllers. 

This picture is further reinforced when we examine the 
time of convergence to a final morphology and controller in 
each run. On average, the convergence to the final (best of 
run) morphology occurs at generation 558. In comparison 
the mutation which leads to the best-ever controller occurs 
significantly later (p < 0.001) at generation 2926. Widening 
our view from only the final successful variations, and con¬ 
sidering all individuals who were the top fitness performers 
at some point during optimization, we see the same story, 
with controller mutations leading to top performers contin¬ 
uing significantly later than those created by morphological 
mutations (mean of gen 750 vs. gen 158, p < 0.001). The 
next section will discuss a potential cause for such an effect. 

Discussion 

The above results suggest that, in this instance of virtual 
creatures co-evolving morphology and control, we run into a 
problem of premature convergence, which is especially pro¬ 


nounced with regard to the morphology of the creature. Pre¬ 
mature convergence alone could point to issues in any num¬ 
ber of aspects of optimization (diversity maintenance, ge¬ 
netic encoding, etc.). However the difference between opti¬ 
mization effectiveness of morphology and control draws our 
attention towards the theory of embodied cognition. 

Let’s revisit the concept of the morphology as the inter¬ 
face between the control architecture and its effect on the 
environment. This suggests that modifications to an agent’s 
morphology will not only change the shape of its body, but 
also change the way in which its control architecture affects 
the environment, since the commands sent by that controller 
will now be interpreted differently - as it affects the actua¬ 
tors of a different body layout. Thus mutations to the mor¬ 
phology of a creature will have the effect of also “scram¬ 
bling” its controller (causing variation in it) as well. 

Contrary to the chain reaction effect of morphological 
mutations, variations which occur to the controller do not 
affect any part of the morphology’s relationship with the 
outside environment. While the control signals which the 
body is receiving may change, these new commands are still 
executed in the same framework and “language” as previ¬ 
ous commands were. The organization and path of infor¬ 
mation from controller through morphology to environment 
causes variations in the morphology to propagate upstream 
(i.e. affecting the controller/morphology interface in addi¬ 
tion to the morphology/environment interface), while varia¬ 
tions to the controller do not propagate downstream (affect¬ 
ing the control/morphology interface, but not the morphol¬ 
ogy/environment interactions). 

This feature of embodied cognition has the effect of cre¬ 
ating larger (and arguably more unpredictable) behavioral 
changes to similar sized variations to the “morphology” 
genome than the “controller” genome. This effect would 
lead to a more rugged fitness landscape in the space of 
morphologies (for a given controller) than exists in the fit¬ 
ness landscape of controllers (for a given morphology). We 
would then predict that a more rugged landscape would lead 
to more local optima and less efficient optimization with 
quicker convergence to sub-optimal solutions than in less 
rugged landscape (Kauffman, 1993). This is consistent with 
what we have experienced thus far with the optimization of 
morphology converging prior to control. 

Potential Causes and Limitations 

There are undoubtedly features of this experimental setup 
which may cause us to overstate (or understate) the impor¬ 
tance of embodied cognition compared to other instances. 
Firstly, this setup employs soft robots, which are notoriously 
compliant and adaptive to a wider variety of environmental 
conditions than their rigid body counterparts (Trivedi et al., 
2008). Given that adaptability of this robot-environment in¬ 
terface (in our case to unexpected perturbations in control 
signals), it’s possible that soft robots dampen this effect. In 


230 



the extreme, one may conjecture that the soft robot paradigm 
is so compliant that almost any morphology can adequately 
move along flat ground. If this is the case, then it would not 
be surprising that freezing the morphology on an arbitrary 
shape has little effect on the resulting fitness value. As soft 
robots are relatively new to the literature, this may explain 
why this effect has been unnoticed previously. 

In order to further explore this facet, we produced an al¬ 
ternative fitness function which explicitly selects for shape 
(adding a term to minimize the number of actuated voxels or 
“energy”). In the extreme this would produce creatures with 
minimal muscle cells, though since actuated cells directly 
contribute to locomotion ability, a complex trade-off creates 
an incentive for specialized energy-efficient morphologies. 
Another way incentivize to specialized morphologies would 
be to evaluate the robot in a more complex (and morpholog¬ 
ically dependent) task environment than flat ground. 

Performing the same “freezing” tests on creatures evolved 
under the alternative fitness criteria, we see that that freezing 
morphology continues to show a non-significant effect on 
fitness at times when control freezing produces a significant 
fitness drop (e.g. gen 50). Fig 4 visually shows the continued 
convergence to final gross morphologies (with morphologies 
at gen 50 generally mirroring those found at gen 5000), as 
well as the added morphological dependence of the task - as 
the morphologies demonstrated here visually appear more 
complex than the more fully occupied shapes in Fig. 1. 

In this treatment, we also see the final controllers appear¬ 
ing significantly later (gen 2968) than the final moprhologies 
(gen 419, p < 0.001). This is also seen in the average best- 
so-far individuals, with those produced by control mutations 
continuing to appear significantly later on average (gen 709) 
than those produced by morphological variations (gen 119, 
p < 0.001). This data suggests that while the original task 
was not as “morphologically dependent” as others, the find¬ 
ings still hold in a scenario which puts more of an emphasis 
on morphological optimization. 

A second aspect which may contribute to this effect is 
the size of the search space. These runs use robots of 
size 7x7x7. As each of these voxels can have one 
of three states (empty, actuated, or passive) which results 
in 3 343 = 4.5 * 10 163 distinct morphological phenotypes. 
It’s possible that the difficulty in searching the morphology 
space is due in part to its size. This could explain why this 
effect was not seen sooner (as previous work in evolutionary 
robots heavily favors legged morphologies with low degrees 
of freedom). This phenotype is indirectly encoded, but gen¬ 
erative in different ways than previous work evolving mor¬ 
phology (Sims, 1994b; Lehman and Stanley, 2011). 

In attempting to reproduce the work from Joachimczak 
et al. (2016), we optimize phase offset and frequency for an 
oscillating actuation as the control parameters. These values 
are encoded by floating point numbers, and thereby create a 
continuous (theoretically infinite) search space for control. 


(a) Generation 50: 








(b) Generation 5000: 


• kill 

I *£ M M 


Figure 4: Stagnation shown in the top 10 morphologies un¬ 
der the distance/energy fitness treatment. Note the similar¬ 
ity in gross morphologies from gen 50 (top) to gen 5000 
(bottom). The top performing creature shows the largest 
change between these points, with the new morphology ar¬ 
riving from a mutation at gen 53. Also note the variance 
and complexity in forms, compared to Fig. 1, suggesting the 
added morphological dependence of this fitness function. 


The concept of discrete physical cells creating a morphol¬ 
ogy and real valued control parameters (such as neuronal 
synapse weights) fits biologically - but the differing search 
spaces give us pause from an optimization perspective. 

To create a similar scenario where the size of the con¬ 
troller search space was smaller than that of the morphology, 
we borrow the two distinct “muscle type” system from Ch¬ 
eney et al. (2013). This allows just two offset control states 
(implemented by rounding the continuous phase offset val¬ 
ues) to create a search space of size 2 343 = 1.8 * 10 103 
(smaller than the morphology space). In this set of tri¬ 
als, we see the above effect disappear, and morphology no 
longer appears to be more difficult to optimize than “con¬ 
trol”. Here, the final morphological innovation of each run, 
on average, occurs at generation 665, while control innova¬ 
tion continues only to gen 795 - an insignificant difference 
(p = 0.149). Similarly, the point at which freezing morphol¬ 
ogy causes a non-significant difference in resulting fitness 
values no longer occurs before that of controller freezing. 

However in this scenario, the line between “morphol¬ 
ogy” and “control” becomes very blurry. In practice, a two- 
oscillator-actuation system can be viewed as the placement 
of cells of these two types (a “morphological” concept) more 
so than the fitting of phase offset parameters to a prede¬ 
fined placement of muscles (a “control” concept). Thus one 
could easily argue that the two discrete-phase-offset system 
from Cheney et al. (2013) should be considered to be en- 
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tirely morphological optimization, with little to no control to 
be optimized (as is argued in that paper), and thus immune 
from our embodied cognition argument. 

This is representative of a larger “problem” of this CPPN 
oscillating actuation setup: that there may not be a clean dis¬ 
tinction between “morphology” and “control” to be made, 
and such divides may be arbitrary labeling. In our example, 
one could argue that the output node denoting if a cell is ac¬ 
tuated or passive should belong in the “morphology” CPPN, 
as it denoted the placement of different types of cells (“mus¬ 
cles” or “tissues”). But another person could argue equally 
well that this output belongs in the “control” CPPN, since it 
does not change the actual shape or stiffness of the creature, 
and only informs where actuations do or do not occur. 

The point here is that virtual creatures are situated and 
embodied, and thus ideas like embodied cognition or mor¬ 
phological computation (Pfeifer and Gomez, 2009) suggest 
that there isn’t a clear cut distinction or dualism between two 
separate pieces (the body and the brain), but rather a single 
integrated and embodied agent. Therefore we need to con¬ 
sider the tight coupling and interdependencies of the “mor¬ 
phology” and “control” and consider holistic effects when¬ 
ever we attempt to modify a single part of the system. 

Future Work 

The results shown in this work are specific only to this in¬ 
stance and experimental setup. Thus, many more instances 
of this approach (artificially separating morphology from 
control and freezing each to measure their independent im¬ 
pact on fitness) would need to be attempted on different ex¬ 
perimental setups to extrapolate from this single instance. 
This should ideally include different: morphological encod¬ 
ings (such as the generative block encodings used by Sims 
(1994b); control architectures (perhaps complexifying to 
neural nets rather than simplifying to discrete oscillations 
as we did in our follow up tests - or employing closed-loop 
control, which may help controllers to adapt to new mor¬ 
phologies); evolutionary algorithms (especially those with 
a strong emphasis on diversity); tasks (increasing environ¬ 
mental complexity); and/or scales (as increased scales of 
a cellular creature closer approximate a “continuous” mor¬ 
phology - which comes with various benefits and costs). 

Regarding the distinction between “morphology” and 
“control”, this work necessarily chooses a logical splitting 
point between the two: representing CPPN outputs that dic¬ 
tate placement of voxels as “morphology” and outputs that 
dictate voxel size changes as “control”. But this distinction 
is far from black and white. Future work should explore var¬ 
ious groupings of outputs into the categories of “morphol¬ 
ogy” and “control” (or any grouping names), and examine 
the effect that such distinctions produce on these results. 

The central issue to this paper can be viewed as a prob¬ 
lem stemming from the dynamic coupling of control on 
morphology, with different morphologies creating hills and 


valleys in the fitness landscape of controllers. As in any 
multi-modal landscape, diversity maintenance during search 
is crucial. This includes diversity coming from crossover 
(omitted here), or from any existing diversity maintenance 
method. However, informed by this paper, we would be wise 
to notice that since hills and valleys of this landscape may be 
caused by the morphology and onto the controller, diversity 
maintenance would do best to focus on protection of diver¬ 
sity within morphologies if it were to encourage the morpho¬ 
logical variations (despite their adverse effects on control). 

The most important future work would involve poten¬ 
tial solutions to this problem. Initial results regarding fu¬ 
ture work already suggest that our understanding of em¬ 
bodied cognition, and the finding of especially poor muta¬ 
tion success for morphological variations, can inform im¬ 
proved search methods. Specifically, results employing a 
multi-timescale model, in which morphological mutations 
are given time to re-adapt their controllers to their new situ¬ 
ated forms (and thus conform themselves to their new mor¬ 
phological “communication channels”, thereby “unscram¬ 
bling” the detrimental effects of the morphological muta¬ 
tion) before the value of these morphological variations are 
evaluated, shows an improved ability for optimization of vir¬ 
tual creatures compared to traditional methods. This is ex¬ 
actly the type of diversity maintenance that focuses on pro¬ 
tecting innovations to the morphology specifically. 

Ideally, further algorithmic improvements will occur from 
embracing the fundamental theory of embodied cognition, 
but the positive initial results noted here provide conforma¬ 
tion that it’s possible and that the understanding gained from 
this current work may contribute to future improvements. 

Conclusion 

We have examined a specific example of co-evolving mor¬ 
phology and control in virtual creatures. In this exam¬ 
ple, morphology prematurely converges: converging quicker 
than control, showing a lack of fitness benefits after as lit¬ 
tle as 25 of the 5000 generations, and with “optimal” final 
morphologies emerging significantly sooner than final con¬ 
trollers. We have suggested a theoretical basis, founded in 
the concept of embodied cognition, that could explain such a 
obstacle and is consistent with the results we present. While 
there is plenty of work still to be done to solidify this the¬ 
ory, we conclude by suggesting future work based from our 
newly proposed understanding, and note its striking poten¬ 
tial in early initial results. We hope this work will help to 
explain the difficulty we face in scaling the complexity of 
evolved virtual creatures, and will help inspire (combined 
with other efforts) a solution to our current stagnation. 
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Abstract 

The concept of morphological computation holds that the 
body of an agent can, under certain circumstances, exploit 
the interaction with the environment to achieve useful be¬ 
havior, potentially reducing the computational burden of 
the brain/controller. The conditions under which such phe¬ 
nomenon arises are, however, unclear. We hypothesize that 
morphological computation will be facilitated by body plans 
with appropriate geometric, material, and growth properties, 
while it will be hindered by other body plans in which one or 
more of these three properties is not well suited to the task. 
We test this by evolving the geometries and growth processes 
of soft robots, with either manually-set softer or stiffer mate¬ 
rial properties. Results support our hypothesis: we find that 
for the task investigated, evolved softer robots achieve bet¬ 
ter performances with simpler growth processes than evolved 
stiffer ones. We hold that the softer robots succeed because 
they are better able to exploit morphological computation. 
This four-way interaction among geometry, growth, material 
properties and morphological computation is but one example 
phenomenon that can be investigated using the system here 
introduced, that could enable future studies on the evolution 
and development of generic soft-bodied creatures. 


Introduction 

Evolving complete and intelligent artificial creatures is one 
of the long-term goals of artificial life and evolutionary 
robotics researchers. More than two decades after the first 
pioneering attempts (Sims, 1994), we are still far from 
matching the complexity exhibited by even the simplest or¬ 
ganisms. Nevertheless, many insights have been gained to 
date, and many limitations overcome. 

Hand in hand with similar developments in robotics (Kim 
et al., 2013; Rus and Tolley, 2015), substantial steps for¬ 
wards in the complexity and interestingness of evolved vir¬ 
tual creatures have been recently made, by allowing evolu¬ 
tion to make use of soft materials (Hiller and Lipson, 2010, 
2012; Joachimczak and Wrobel, 2012; Joachimczak et al., 
2014; Cheney et al., 2013; Rieffel et al., 2014; Lessin and 
Risi, 2015). In addition to enhancing morphological and be¬ 
havioral diversity, the use of soft materials allows morpholo¬ 
gies that more closely mimic biological ones, thus enabling 



Ligure 1: A soft (a-d) and stiff (e-h) robots evolved 
to grow towards two lateral light sources. Red voxels 
expand in response to environmental stimuli, blue ones 
shrink. While the soft robot only employs expanding vox¬ 
els and effectively exploits morphological computation, pas¬ 
sive dynamics, and the interaction with the environment 
to solve the task, the stiff one is prevented from doing so 
due to its unsuitable material properties, and had thus to 
evolve a more complex and active form of control in or¬ 
der to achieve the same result. See them in action at: 
https://youtu.be/Cw2SwPNwcfM 

the investigation of additional aspects of evolution and de¬ 
velopment that were previously beyond reach. Here we fo¬ 
cus on two such aspects relevant to soft-bodied creatures. 

The first regards morphological plasticity: the ability 
to change some aspects of the body during one’s life¬ 
time. Here we investigate environment-mediated morpho¬ 
logical development — growth in response to environmen¬ 
tal stimuli — referred to henceforth simply as growth. Al¬ 
though it has been shown that morphological growth can 
provide adaptive advantages for machines (Bongard, 2013), 
previous work only focussed on rigid-bodied agents and 
environment-insensitive growth processes. Yet there is ev¬ 
idence that biological development is influenced and driven 
by the environment. Lor example, plant roots follow gra¬ 
dients of nutrients in the soil, while human bones and tis¬ 
sues alter their properties in response to mechanical loading 
(Wolff, 1986). Moreover, when compared to rigid-bodied 
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creatures, soft ones more naturally allow for some forms of 
morphological plasticity, that are already within the reach 
of soft robotics technology as well. Despite that, probably 
due to the lack of a general understanding of why, when, 
and how these new capabilities should be exploited, these 
robots feature to date basic forms of morphological plas¬ 
ticity (Shepherd et al., 2011; Corucci et al., 2015b,c), or no 
plasticity at all (Calisti et al., 2015; Corucci et al., 2015a; 
Cacucciolo, Corucci et al., 2014). 

The second aspect we explore is the influence of material 
properties on the evolution and development of adaptive be¬ 
havior. The behavior of soft-bodied creatures is to a large 
extent determined by their material properties, yet in soft 
robotics these are often fixed a priori and for the whole life¬ 
time of an agent, perhaps after a limited number of heuris¬ 
tic tests. Some recently proposed ideas suggest that those 
properties — and softness in particular — can have implica¬ 
tions for the development of intelligent behavior (Pfeifer and 
Bongard, 2006), but few studies (Nakajima et al., 2013) and 
theoretical frameworks (Hauser et al., 2011, 2012) elucidate 
and quantify these implications, typically not embracing an 
evolutionary and developmental approach. 

Here we investigate the evolution and development of soft 
robots. In this context we hypothesize that with the right 
combination of geometry, material properties, and growth 
process, a robot can exploit morphological computation 
(Paul, 2006; Hauser et al., 2014) better than one for which 
one or more of these aspects is not well adapted. We test 
this hypothesis by evolving the body plans and developmen¬ 
tal trajectories of simulated soft robots for a phototaxis task. 
Two variations of such robots are evolved: softer and stiffer 
ones. We find that the former achieve better performances, 
despite their evolved growth processes being simpler, ac¬ 
cording to an information theoretic measure. By interpreting 
environment-mediated growth as a form of control, it is sug¬ 
gested that the softer robots are in fact exploiting more mor¬ 
phological computation than the stiffer ones. This hypoth¬ 
esis is but one of many that can be tested using the system 
here introduced. These include the investigation of general 
relationships among morphology, control, evolution, and de¬ 
velopment. Such studies could provide a deeper comprehen¬ 
sion of biological systems, with potential implications for 
the development of more complex, autonomous and adap¬ 
tive machines. 

Methods 

Simulated task and environment. In this work soft- 
bodied creatures are simulated in the VoxCad environment 
(Hiller and Lipson, 2014). A number of changes and new 
features have been introduced in the simulator in order to 
enable our experiments. 

First, sources of environmental stimuli can now be added 
to the environment. These sources are characterized by a 
fixed 3D location, and robot’s voxels can sense the distance 


from each of them. 

Second, the base of each robot is fixed to the ground for 
the entire simulation. If a robot does not touch the ground at 
the beginning of the simulation, it is translated along the z 
axis before the simulation starts, until it does. This is done 
to put emphasis on growth and deformability, ruling out lo¬ 
comotion strategies to approach the light sources. 

Third, differently from other works adopting VoxCad 
(Cheney et al., 2015, 2013; Methenitis et al., 2015), in this 
experiment there is no fast-twitch actuation mechanism (i.e. 
the fast control based on an oscillating global signal is dis¬ 
abled). A distributed growth mechanism has been imple¬ 
mented instead, acting at a slower time scale (more below). 

The task is inspired by plants, and consists in perform¬ 
ing stationary phototaxis: growing towards static sources 
present in the environment. While reaching a single source 
is not a particularly difficult task, simultaneously pointing 
toward multiple ones becomes more challenging, as it re¬ 
quires the ability to evolve modular, branching structures. 
The specific growth mechanism is detailed in the next sec¬ 
tions, as well as the underlying developmental paradigm. 

Developmental paradigm. Different approaches have 
been adopted in the literature in order to model developmen¬ 
tal processes (Stanley and Miikkulainen, 2003). Those can 
be roughly classified based on the level of abstraction with 
respect to the biological phenomenon they try to capture. 
Among high-level abstractions we find grammar-based ap¬ 
proaches (Rieffel et al., 2014; Hornby and Pollack, 2002) 
and CPPN-based ones (Stanley, 2007; Cheney et al., 2013; 
Auerbach and Bongard, 2014). Lower-level abstractions, 
broadly referred as cell chemistry approaches (Doursat et al., 
2013; Joachimczak et al., 2014; Bongard and Pfeifer, 2003), 
model finer details of developmental processes, such as gene 
regulatory dynamics. 

Despite many achievements in cell chemistry, a drawback 
of these approaches lies in their complexity. On the other 
hand, when adopting a high-level perspective, the risk ex¬ 
ists of overlooking potentially useful aspects of develop¬ 
ment. As an example, CPPNs and grammatical encodings 
neglect both the unfolding over time of biological develop¬ 
mental processes as well as the interaction of the creature 
with the environment during those processes. 

The approach proposed here is based on CPPNs, but em¬ 
powers them by joining their ability to capture the forma¬ 
tion of regular patterns (Stanley, 2006) with an environment- 
mediated developmental stage that unfolds over time. While 
the implications of such a choice deserve to be thoroughly 
investigated in future work, we note that this approach en¬ 
ables potentially interesting feedback loops during develop¬ 
ment: growth is guided by environmental stimuli, but mod¬ 
ifies in turn the sensory information the creature will expe¬ 
rience next. Also, as mutation effects may arise at different 
points during development, the ability to enact changes later 
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Figure 2: Different attributes of the robot can be ’’painted” 
by different CPPNs. CPPN1 dictates the geometry of the 
robot, while CPPN2 determines its growth properties. In 
the current system red voxels expand in response to envi¬ 
ronmental stimuli, while blue ones shrink. 


in development may allow for smoother fitness gradients 
than all-or-nothing mutations (Hinton and Nowlan, 1987). 



Description 

D[ t] 

Linear dimension of voxel v at time t 

4 t} 

Scaling factor of voxel v at time t 

£>vn 

Nominal dimension of voxel v (set to 1) 

9a 

Growth amplitude G (0,1) (equal V voxel, set to 
0.5) 

N 

Number of environmental sources 

i 

A specific source 

s (t) 

*vi 

Influence of source i on the scaling factor 
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Lower bound for (set to 0.1) 

9rw 

Growth parameter of voxel v (G [—1,1]) 

-rit) 

di 

Distance of voxel v from source z, normalized 
by voxel size (the latter being set to 0.01) 


Time scales. The proposed developmental paradigm em¬ 
beds all three time scales experienced by living systems: 
the evolutionary/phylogenetic time scale, the developmen¬ 
tal/ontogenetic time scale, and the sensorimotor dynamics 
timescale (Pfeifer and Bongard, 2006). The sensorimotor 
timescale is here represented by the interaction of the body 
with the environment during growth (e.g. gravity, collisions, 
detected light levels, etc.). 

The genotype. The encoding here adopted is based on 
CPPNs (Stanley, 2007). Designed to capture the forma¬ 
tion of regular patterns in developmental systems without 
modeling development per se (Stanley, 2006), CPPNs are 
networks that convolve incoming spatial information to pro¬ 
duce outputs that tend to exhibit symmetry, repetition, and 
repetition with variation. 

In order to enforce a distinction between geometric and 
growth properties, here we adopt two different CPPNs (Fig. 
2) that are queried for each voxel of a cubic workspace. 
They receive the same inputs: the 3D location of the voxel 
(x,y,z), the polar radius (d), and a constant bias ( b ). 

The first CPPN, determining the creature’s geometrical 
structure, has a single output o G [—1,1] that dictates 
whether a voxel should be empty (o < 0) or filled (o > 0). 
Differently from similar setups (Cheney et al., 2013), evo¬ 
lution is provided with a single material in the experiments 
here reported, that is assigned to all non-empty voxels. A 
different stiffness can be specified for this material in differ¬ 
ent evolutionary runs, though all voxels in each run share a 
global and constant stiffness parameter. 

The second CPPN determines the growth properties of 
each voxel. It also has a single output G [—1,1] henceforth 
referred to as the growth rate ( g rv ), whose role is described 
in detail in the next section. 


Table 1: Description of parameters appearing in Eq. 1 


Environment-mediated morphological development. 

We simulate environment-mediated growth by enabling 
voxels to change volume in response to environmental 
stimuli (i.e. distance from each light source). This choice 
is motivated by the fact that localized volumetric changes 
are easy to achieve with currently-available soft robots 
(e.g. exploiting pneumatic actuation). A localized change 
in stiffness is also within the reach of current technology 
(Majmudar et al., 2007), making this kind of plasticity the 
next candidate to be integrated into our system. Topolog¬ 
ical modifications are, on the other hand, more difficult 
to achieve in real soft robots, as they require adding or 
removing material. Nevertheless, attempts have been made 
in this direction as well (Brodbeck et al., 2012). 

The growth process is governed by equations and param¬ 
eters reported in Eq. 1 and in Tab. 1: 


V voxel v, at each time step t 
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The growth rate parameter g rw determines the quality and 
the extent of the localized volumetric change for each voxel. 
When g rv > 0 the voxel will expand when close to a source, 
when g rY < 0 it will shrink. When g VY is exactly zero, the 
voxel is insensitive to environmental stimuli. The greater the 
magnitude of g YY for a particular voxel, the more pronounced 
will be its volumetric variation, that is also modulated by 
the distance from the sources. Each voxel can experience 
a considerable modification due to development: having set 
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g a = 0.5 entails a 50% linear contraction/expansion with 
respect to the nominal size, resulting in a ^ 238% variation 
in volume. The parameter s v _ m i n ensures that the voxel does 
not shrink below a given size (here the 10% of the nominal 
size), for stability of simulations. The quantity As vjnax (not 
reported in Eq. 1 for ease of reading) dictates the maximum 
allowed As v = \s^ — ^ | between two subsequent time 

steps, as follows: 

if (As v > As v _ max ) then: 

4° = 4 t-1) + sign(As v ) • ASvjnax 

In addition to influencing the stability of the simulation, this 
parameter (set to 0.0005 in our experiments) regulates the 
speed of the growth process: the higher A«s v _ max , the more 
rapid the growth. The value for As vmax was selected in 
such a way that development acts over a slower time scale 
with respect to the typical sensorimotor dynamics (such as 
those generating locomotion or grasping behavior), thus im¬ 
plementing the developmental time scale. 

Development is based on distributed sensing and actua¬ 
tion: each voxel senses the distance from all the sources 
and acts accordingly. Nevertheless, coordinated behavior 
emerges, for at least two reasons: first, nearby voxels ex¬ 
perience similar sensory stimuli, and second, CPPNs tend 
to produce patches of tissue with homogeneous or smoothly 
varying growth parameters. 

Optimization. A multi-objective implementation of 
NEAT (Cheney et al., 2015) has been adopted. Before 
performing selection, pareto ranking is applied, according 
to three objectives: the order in which sorting is performed 
determines the relative importance of each of them. The 
objectives are listed below from the most to the least 
important: 

1. Minimize the distance from each of the sources 

2. Minimize the number of employed voxels 

3. Minimize the age of each individual 

The first objective selects for phototaxis. This is imple¬ 
mented by minimizing the sum d m i n j, where N is the 

number of sources and d m i n j is the minimum distance be¬ 
tween the robot and the i- th source. The second objective 
selects for smaller robots. The first two objectives are an¬ 
tagonistic as it is easier, in general, for larger robots to be 
closer to the sources (even if they do not grow at all). The 
combination of the two objectives thus selects for robots that 
exploit the growth process and their deformability to accom¬ 
plish the task. The third objective helps maintain diversity 
in the population (Schmidt and Lipson, 2011). 

Morphological computation and control complexity. 

Morphological computation (Hauser et al., 2014) has been 
defined as ’’computation obtained through the interaction of 


physical forms” (Paul, 2006). When it comes to robotics 
and embodied cognition (Pfeifer and Bongard, 2006), the 
idea is that part of the computation needed to perform a 
task can take place (implicitly or explicitly) not only in 
the brain/controller, but also within the body itself, pro¬ 
vided that it has suitable characteristics. It has been argued 
that this property can alleviate the computational burden of 
the brain, simplifying the controller and achieving a more 
balanced brain-body trade-off (Pfeifer and Bongard, 2006; 
Paul, 2006; Hauser et al., 2011) that could hold the key to 
more intelligent, effective and natural behaviors. Many ex¬ 
amples have been described in the literature (Pfeifer and 
Bongard, 2006). It is often postulated that systems bene¬ 
fiting from morphological computation tend to exploit the 
interaction and dynamical coupling with the environment in 
a beneficial way, e.g. leveraging passive dynamics in place 
of active control. 

We hypothesize that material properties can affect evolu¬ 
tion’s ability to exploit morphological computation. To test 
this, we here define morphology as the robot’s shape and ma¬ 
terial properties, and its ‘controller’ as the distributed growth 
mechanism which achieves phototaxis. We defend this latter 
definition as, like control, growth here closes the sensation- 
action feedback loop (although over a slower time scale). 

For our purposes, we can thus define morphological com¬ 
putation as a property that simplifies the growth controller 
by exploiting in a beneficial way the interplay between ma¬ 
terial, geometric, and growth properties through the dynam¬ 
ical interaction with the environment. 

To measure the extent of morphological computation in a 
given robot, we define control complexity as follows: 

n 

H(g c ) = -^2 Pi log 2 Pi (2) 

i = l 

where: 

9c = {9rv V voxel v} 

r x i +1 

Pi = / p{x) dx (3) 

J Xi 

Xi = — 1 + 0.02 i i = 0 ... 100 

The real-valued random variable g c is associated to the 
growth rate quantity (Eq. 1, Tab. 1), embracing all param¬ 
eters that collectively shape the growth trajectory of a given 
robot. The quantity H(g c ) is the Shannon entropy (Shannon, 
1948) of such a variable, whose probability density function 
p(x) is discretized using n— 101 uniform bins (Eq. 3). 

The control complexity of g c thus corresponds to the 
number of bits that are necessary to describe the pattern of 
growth parameters: the higher this number, the more com¬ 
plex the controller. Consider two robots (r*i, 7 * 2 ) and their 
associated growth controllers (g c 1, g c 2 ). We will state that 
a difference AH = H(g c 2 ) — H(g c 1 ) > 0 between the 
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two controllers indicates that g c 2 is more complex than g c 1 . 
Moreover, if the two robots happen to score the same fitness, 
we will argue that r\ better exploits morphological compu¬ 
tation, as it requires simpler control to produce an equally 
effective behavior. It should be made clear that we are not 
providing here a general information theoretic metric to cap¬ 
ture morphological computation (Zahedi and Ay, 2013), but 
rather a proxy to measure its effect in our setting. 

Experiments Populations composed of 30 individuals are 
allowed to evolve for a maximum of 1500 generations. The 
maximum evaluation time for each individual is 3.5s (simu¬ 
lation time, wall time is higher). The simulation is stopped 
earlier if the robot settles into a static conformation before 
the allocated time elapses. The growth process starts after 
the first 0.5 seconds, which usually allows the initial shape 
to settle into an equilibrium position. 

A first set of experiments is performed to qualitatively 
assess the overall ability of the system to evolve effective 
robots. To this end, 10x10x10 and 8x8x8 robots have been 
evolved in several environments, differing in the number 
(from 1 to 4) and position of environmental sources (Fig. 
3). Material stiffness is set to E = 5 MPa , corresponding 
to a rather soft material (comparable to rubber). Five runs 
were performed for each configuration. 

A second set of experiments is then performed with 
6x6x6 robots, characterized by different stiffness values 
(Ei = 500 MPa , E 2 = 5 MPa) evolving in the same en¬ 
vironment (2 laterally placed sources). Twenty independent 
runs were performed for each treatment. Reported confi¬ 
dence intervals are computed with a bootstrapping method, 
while p-values are the result of the Mann-Whitney U test. 

The code used to produce these results is publicly avail¬ 
able at: https://goo.gl/cA21uO. A video show¬ 
ing some of the creatures in action is available at: 
https://youtu.be/Cw2SwPNwcfM 

Results and Discussion 

A sample of the fittest morphologies evolved in preliminary 
trials is reported in Fig. 3 and in the accompanying video. 
Symmetry and modularity can be observed, with the latter 
property evidenced by the emergence of relatively indepen¬ 
dent appendages. As these features are ubiquitous in nat¬ 
ural systems, their presence here may suggest the potential 
for more competent and scalable virtual creatures. More¬ 
over, these properties appear to be selected for by our task 
and environment, as this level of morphological regularity is 
not common in similar settings (Cheney et al., 2013, 2015; 
Methenitis et al., 2015). 

It can be noted that the best individuals from these runs 
tend to only exploit expanding (red) tissue, and not shrink¬ 
ing (blue) voxels (Fig. 3). Evolved creatures appear able to 
leverage their passive deformability and interaction with the 
environment in order to solve the task, rather than requiring 
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Figure 4: Average fitness over 20 independent runs. Softer 
robots have an evolutionary advantage over stiffer ones in 
this task/environment. 

differential expansion and contraction to point towards light 
sources. For example, cantilevers spontaneously evolve (Fig 
3b). Like human-built cantilevered structures, these robots 
distribute stresses across themselves with a minimum of sup¬ 
port structure. Moreover, the curvature needed to point to¬ 
wards the two lateral sources spontaneously emerge from the 
passive interaction of the expanding body with the environ¬ 
ment (and with gravity, in particular), rather than through 
internal actuation of the creature. This corresponds, intu¬ 
itively, to the idea of morphological computation. 

Given the scarce presence of highly-fit robots exploit¬ 
ing more complex forms of control — based on the com¬ 
bination of shrinking and expanding voxels — we hypoth¬ 
esize that it may be easier for evolution to discover solu¬ 
tions based on morphological computation rather than ex¬ 
plicit control, provided that material properties allow it to 
do so. This would confer, in general, an evolutionary advan¬ 
tage to robots that have the ’’right” material properties for a 
given task/environment. This hypothesis is tested with the 
second set of experiments. 

Geometry, materials, growth, and morphological com¬ 
putation. Fig. 4 reports the evolution of robots with stiffer 
(Ei) or softer (E 2 ) material properties, optimized to simul¬ 
taneously approach two lateral light sources placed on op¬ 
posite sides of the creature. Results show that softer robots 
have an evolutionary advantage over stiffer ones in this par¬ 
ticular task/environment, which may be attributed to evolu¬ 
tion’s ability to better exploit morphological computation in¬ 
stead of developing more complex and active forms of con¬ 
trol to regulate their shape. This can be qualitatively ob¬ 
served in the comparison of two highly-fit robots reported in 
Fig. 1 (see them in action in the accompanying video). With 
the softest material, evolution produced a passive cantilever, 
taking advantage of gravity to unfold the shape towards the 
two sources during growth. Under the effect of gravity, the 
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Figure 3: A small sample of the growing soft creatures, evolved in environments featuring: a-c) two lateral sources d) four 
sources placed at the corners. a,d) top view, b,c) front view. Most of the highly fit soft robots only exploit expanding tissue. 


structure passively deforms and achieves an effective curva¬ 
ture, being able to sustain its own weight and point towards 
the sources at the same time. On the other hand, the stiffer 
robot fights gravity holding the two rigid arms horizontally, 
achieving the curvature needed to direct its appendages to¬ 
wards the sources through the antagonistic action of shrink¬ 
ing and expanding voxels in the central part of the body. 

Further analyses suggest the generality of these observa¬ 
tions in this task/environment. Figure 5 shows that stiffer 
robots use substantially more shrinking voxels in general. 
Their growth processes involve more active control in the 
sense that the appendages are pulled as well as pushed to 
achieve proximity to the environmental sources. This sug¬ 
gests that for stiffer robots, simpler strategies in which cur¬ 
vature is achieved passively in response to weight are either 
harder to find in the search space or are not viable at all. 

Given the fitness benefit of expanding voxels, enlarging 
the volume of the robots and allowing them to approach 
the sources more closely, we would expect shrinking voxels 
only to be employed when necessary to control the direction 
of evolved appendages. Thus, the presence of more shrink¬ 
ing voxels in the stiffer robots suggest that they are unable 
to perform passive pointing from their material properties 
alone, as exemplified in the softer robots (Fig. 1). 

The intuitive considerations regarding control complex¬ 
ity and morphological computation are also confirmed by an 
information theoretic analysis of the evolved robots: the av¬ 
erage control complexity H(g c ) (the global entropy across 
g c ) is significantly higher for the stiffer robots than for the 
softer ones (Fig. 6). In other words, stiffer robots employ 
more complex and active controllers. This again suggests 
that their morphologies are unable to perform the task to the 
same level without control. 

In summation, softer robots better exploit morphological 
computation in this particular task/environment, achieving 
better performances (Fig. 4) with simpler controllers (Fig. 
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Figure 5: Stiffer robots tend to employ significantly more 
shrinking voxels than softer ones (p < 0.002), in the attempt 
to actively control the shape. 

5, 6). This confers an evolutionary advantage to them over 
stiffer robots, if evolved alongside. 

It should be noted that these results do not mean that 
softer is always better. It would be possible to define a 
task/environment that confers an advantage to stiffer robots 
(e.g. grow towards sources placed mid-air, where it is eas¬ 
ier for stiffer robots to sustain their weight). What has been 
demonstrated is that for a specific task/environment, mate¬ 
rial properties can have a pronounced effect on evolution’s 
ability to exploit morphological computation, i.e. to produce 
well adapted morphologies that exploit the interaction with 
the environment in a beneficial way. 

Conclusions and Future Work 

In this paper a novel system to study the evolution and de¬ 
velopment of soft-bodied creatures has been presented. The 
system is able to evolve robots that exhibit desirable mor¬ 
phological properties such as symmetry, modularity, and ex- 
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Figure 6: Stiffer robots exhibit more complex growth con¬ 
trollers than softer ones {p < 0.005), yet the latter achieve 
better performances (Fig. 4). It is argued that this difference 
is due to morphological computation, strictly connected to 
the material properties of robots in the two treatments. 


ploitation of morphological computation. More specifically, 
it was shown that certain combinations of geometry, mate¬ 
rial properties, and environment-mediated growth make it 
more or less difficult for evolution to discover phenotypes 
that exploit morphological computation. Despite being a 
fundamental aspect of soft robotics, the interplay among 
these properties remains to date largely unexplored. Re¬ 
sults also suggest that arbitrarily fixing even one of these 
dimensions can make it difficult for evolution to produce ef¬ 
fective behaviors. Ideally, all of them should be put under 
evolutionary and/or developmental control, so that an opti¬ 
mal combination can be discovered. Future work will be 
directed towards exploring the potential evolutionary advan¬ 
tage of morphological plasticity, as well as possible benefits 
in terms of adaptivity and robustness. The environmental 
influence during development deserves special attention as 
well: its potential benefits for organisms as well as adap¬ 
tive machines will be investigated. Another major topic that 
can now be studied is the general relationship between evo¬ 
lution, development and adaptive behavior. We believe that 
many interesting questions can be answered using the ap¬ 
proach described here, and could help shed light on bio¬ 
logical questions, while simultaneously contributing to en¬ 
gineering fields such as soft robotics. 
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Abstract 

Evolution-in-Materio aims to exploit real-world physics of 
materials to achieve computation by a combination of ex¬ 
ternal stimulus and interpretation of the state of materials 
through measurements and observations. In a majority of 
Evolution-in-Materio work the dynamics of the material is 
filtered out, or the problem is defined in a way that the sought 
solution is a point attractor. In this work we explore the 
dynamics of materials. Within the assumption that suited 
materials include rich behavior emerging from the under¬ 
lying physical processes there should be observable behav¬ 
ior similar to Dynamical Systems with Dynamical Structures 
(( DS ) 2 ). Such behavior result in systems with a possibility 
of inducing perturbations to their own dynamics. Further, the 
importance of the observation level used when observing and 
interpreting the state of the materials is discussed and related 
to dynamics in Evolution-in-Materio systems. 

Introduction 

Evolution-in-Materio (Miller et al., 2014) (EIM) can be seen 
as a method to explore unconventional computation, i.e. 
a computer operating outside of the traditional Turing/von 
Neumann (Turing, 1937; von Neumann, 1993) computa¬ 
tional model and architecture, exploiting the power of evo¬ 
lution, i.e. Computer Controlled Evolution (CCE) to ma¬ 
nipulate a physical system to search for regimes where the 
intrinsic properties of materials provide useful computation. 

The concept of EIM is, as stated by Miller et al. (2014): 
”to exploit the intrinsic properties of materials, or “computa¬ 
tional mediums”, to do computation, where neither the struc¬ 
ture nor computational properties of the material needs to be 
known in advance (Miller and Downing, 2002). In this way 
evolution is a bottom-up design process that can exploit nat¬ 
ural physical processes to do useful computation.” 

Herein the bottom-up design concept is further investi¬ 
gated so as to gain a deeper insight toward exploiting phys¬ 
ical systems such as materials for computation: ’’where nei¬ 
ther the structure nor computational properties of the mate¬ 
rial needs to be known in advance”. Both the structure and 
the intertwined underlying physics leading to computation 
are products of the bottom-up design approach taken. 


When a bottom-up process such as evolution acts on phys¬ 
ical systems, there may be intrinsic processes in the compu¬ 
tational medium that influence the computational function 
as a result of underlying properties resulting in a non-static 
structure and thereby non-static functionality, i.e a two way 
coupling between dynamic structure and functionality. A 
system with the possibility of inducing perturbations to their 
own dynamics as a function of their system states, enables 
state space trajectory changes and topological reconfigura¬ 
tions of the state space (Omholt, 2013). 

In this paper we show that such behaviours, present in liv¬ 
ing systems, can also be found in EIM systems making EIM 
a physical realisations of Dynamical Systems with Dynam¬ 
ical Structures ((DS) 2 ) (Spicher et al., 2004). In such sys¬ 
tems state transition functions and the set of state variables 
can change over time. 

Reaction diffusion systems as in Adamatzky (2009) work 
exploit massively parallelism of state updates in growing 
patterns where information processing take place. The 
reaction-diffusion computer is based on local interactions 
and change of spatial properties over time. Growth of pat¬ 
terns change the state and topological properties of the ma¬ 
chine providing a similarity to the ((DS) 2 ) dynamical re¬ 
configurations of the state space. 

Further, it can be useful to compare the concept of EIM 
with morphological engineering (ME) Doursat et al. (2013). 
Central to ME is the concept of agents and the mechanisms 
involved in controlling them. EIM places less focus on the 
interaction of the individual agents, and more focus on ex¬ 
ploitation of the emergent behaviours of the physical sys¬ 
tems under study. Though local interaction between smaller 
parts of the physical matter (i.e. electro-chemical interaction 
between molecules or concentrations of chemicals) is cru¬ 
cial, EIM attempts to take a step away from the agent-focus 
and consider emergent properties a primary unit of study. 
However, one of the central aspects of ME, i.e. ’’endow¬ 
ing physical systems with information” Doursat et al. (2013) 
does capture EIM. ’Invisible’ dynamics of physical systems 
can be manipulated without direct access to the individual 
agents’ rule sets by giving the systems the ability to filter or 
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react to information and energy presented through physical 
interfaces. 

In this article, an EIM system is observed in a ( DS) 2 sys¬ 
tem view. The results show that such a view is applicable 
and can be used to gain further insight in the working of 
EIM systems. As a result EIM may share several proper¬ 
ties with self-organizing systems. Herein artificial devel¬ 
opmental systems (Kumar and Bentley, 2003) are used as 
a starting point for exploring and exploiting computational 
mediums where computation is a product of underlying dy¬ 
namical physical processes. 

An implication of ’’exploiting the intrinsic properties of 
materials” is that self-organizing processes and the result¬ 
ing computation do not necessary comply with our mostly 
used computational paradigm of digital computers. EIM can 
be seen as a hybrid variant of Analog Computation (AC) 
(MacLennan, 2007). In AC, the fact that a mathematical 
function provides a model of the observed behaviour of a 
physical system is used ’conversely’; a physical system can 
be used to calculate a mathematical function since a phys¬ 
ical system can be parameterized and adjusted to match a 
large number of mathematical functions. Configurable Ana¬ 
logue Processors (CAP) or Field Programmable Matter Ar¬ 
rays (FPMA) (Miller and Downing, 2002) are recent exam¬ 
ple of this paradigm. The hybrid approach of EIM include 
the computational matter, e.g. CAP or FPMA, in a mixed 
signal system using a digital computer to configure and com¬ 
municate with the material. 

This is an approach that enables the computational power 
of the material with the ease of programmability of digital 
computers (Broersma et al., 2012). In a hybrid approach ob¬ 
servability is a core issue. Ensuring that the data from the 
material are observable and sound without using more com¬ 
putational power for the observation then the actual com¬ 
putation (Bremermann, 1962). A practical implication of 
observability is that there is a need for choosing what the 
smallest possible change is to be on the top-level. Top-level 
being the level that the observable computational result(s) 
emerge from the underlying physical processes. 

Shannon (1941) provided an early theoretical model of 
analogue computers, the General Purpose Analog Computer 
(GPAC). Though application of the GPAC model on EIM 
seems un-natural since it relies on chaining together compo¬ 
nents (reminiscent of agents with particular tasks) and defin¬ 
ing bigger expressions out of this. Neural networks is yet 
another potential model and link between the physical EIM 
system and a theoretical model- this link is further discussed 
in Broersma et al. (2012). 

Figure 1 shows an EIM system using a digital computer 
to host an EA to configure a material for computation. The 
material operates in the analogue/physical domain and the 
computer responsible for input/output mapping and config¬ 
uration will operate in the digital domain. 

As stated, our focus is to explore EIM in a (DS) 2 view. 



Figure 1: Principle of EIM using a hybrid approach. Taken 
from Miller et al. (2014). 

In Figure 1 the configurable material is the self-organising 
system. The material inhibits dynamical properties and re¬ 
spond to perturbations from the input and configuration sig¬ 
nals. To observe dynamical properties, the trajectory of the 
system is used as a quantifiable measurement of behaviour. 
In artificial development similar measurements have been 
used as a measurement of evolved complexity (Nichele and 
Tufte, 2013) and to show intertwined influence of structure 
and computation ((DS) 2 ) on artificial developmental sys¬ 
tems (Tufte, 2009). 

The systems herein are all observed at an electrical level. 
The underlying physical (and electrical) processes may 
change over time on the microscopic level, however our ob¬ 
servations is on the ’digital level’. As such, the goal herein 
is to be able to exploit the power of EIM without a high cost 
in ensuring the correctness of the observation, i.e. a underly¬ 
ing rich physical system observed in the constrained digital 
domain. 

Background 

Evolution-in-materio extended to (DS ) 2 

Gordon Pask’s pioneering work in EIM(Pask, 1959) is an in¬ 
teresting piece of work if viewed in a (DS) 2 setting. Pask 
observed (by eye) and evaluated (by ear) the growth of neu¬ 
ral like structures using an electrochemical device made of 
a dish with electrodes covered by a metal salt solution. By 
adjusting the current between electrodes Pask was able to 
grow iron connections that responded to different frequen¬ 
cies. Pask’s system show the principles of a (DS) 2 EIM 
system. The growth of the connections depend on cur¬ 
rent, when current passes the connection grow. By adjust¬ 
ing the current in different regions a structure emerges and 
are adjusted toward the evaluator’s (here Pask) goal. The 
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(a) Block diagram of the Mecobo 
hardware interface. 



qfi# 


(b) Picture of the Mecobo mother¬ 
board with mixed signal daughter 
board. 



(c) The material bay filled with salt 
crystals (red arrow). 


Figure 2: Overview of the Mecobo hardware interface. 


growth of the connections are governed by self-organization 
and perturbations. The system is constantly changing mak¬ 
ing the response to perturbations depending on the systems 
present state. In Pask’s work this property of change by 
self-organization and perturbations was exploited toward the 
goal of growing ”an artificial ear”. 

In the work of Clegg et al. (2014) a material of Carbon 
Nano Tubes (CNT) was exploited to solve the travelling 
salesperson problem by evolving statical configurations that 
manage to solve the problem, i.e. the material’s dynamic 
properties was filtered out by allowing the system to reach a 
steady state, or point attractor. Thompson’s experiments us¬ 
ing a Field Programmable gate Array (FPGA) as a material 
(Thompson et al., 1999) evolved a static configuration defin¬ 
ing the internal circuit architecture. The input signal (sim¬ 
ilar to Pask’s) was two frequencies that changed the output 
value depending on the input frequency (a frequency dis¬ 
criminator). The static configuration in Thompson’s FPGA 
experiments does not allow for dynamic structures. The in¬ 
put signal perturbed the system to an binary observable out¬ 
put. However, the underlying dynamics (internal state tran¬ 
sitions) for the system was untested. 

In Harding and Miller’s EIM work utilizing Liquid Crys¬ 
tal Displays (LCDs), the material is viewed as a type of 
static device with an evolved configuration that enable the 
LCD to compute, e.g. a frequency discriminator (Harding 
and Miller, 2004). However, the LCD is not a static device. 
The behaviour change if it is disconnected and reconfigured. 
To regain previous behaviour a short re-evolution was re¬ 
quired. 

Systems that explicitly exploit dynamic structures such 
as slime moulds (Adamatzky, 2016) or the combination of 
CNTs and liquid crystals (Massey et al., 2015b) show a 


clear connection between dynamic structure and computa¬ 
tion similar to a ( DS) 2 system. 

As in Pask’s work, the work on explicit dynamic struc¬ 
tures and the shown change in response on configuration for 
the LCD it can be argued that a closer look at EIM systems 
as a (DS) 2 system are fruitful as to achieve more complex 
behaviour in such computational medium. More over, in¬ 
cluding more knowledge of the underlying bottom-up pro¬ 
cesses ((DS) 2 ) enables an increased understanding and in¬ 
sight in evolutionary exploitation of EIM systems toward 
complex computational tasks. 

Materials 

Recently the NASCENCE (NAnoSCale Engineering for 
Novel Computation using Evolution) project (Broersma 
et al., 2012) has provided a variety of material samples based 
on CNTs (Massey, 2013), mix of liquid crystals and CNTs 
(Massey et al., 2015b) and gold nano particles (Boses et al., 
2015) for EIM research. Studies of Single walled carbon 
nano tubes(SWCNT) (Massey, 2013; Massey et al., 2015a) 
have shown that these CNTs have novel electrochemical 
properties and have the potential to do computation (Massey 
etal., 2015b). 

The results of NASCENCE (e.g. (Mohid et al., 2014; 
Clegg et al., 2014; Massey, 2013)) have shown that compu¬ 
tational results can be achieved without explicitly exploiting 
the dynamics of physical systems. Here the exploration of 
(DS) 2 expand our target behaviour of materials to a phys¬ 
ical system with rich dynamic properties and complexity 
at many scales, which is a point frequently brought up in 
the complex systems literature. Often one divides a system 
into two broad categories relative to the observation level or 
scale, (Sayama, 2015) (Bar-Yam, 1997) (Fromm, 2004). Mi- 
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croscopic properties or behaviours are those that potentially 
give rise to emergent macroscopic properties. At differ¬ 
ent levels of observation, different macroscopic behaviours 
(such as emergence and self organization) exist, and thus the 
complexity of a system depends on the level of observation; 
as Simon (1962) writes, ”How complex or simple a structure 
is depends critically upon the way in which we describe it.” 

A core idea in EIM is that it is should be possible to ma¬ 
nipulate the microscopic behaviour of the material by pro¬ 
viding input and ’configuration’ energy (which potentially 
affects all scales of the system) such that the emerging or 
self-organizing behaviour is both observable and useful in 
terms of computation. 

To further drive the point of the richness the materials 
used in the experiments are chosen to be very different. Sin¬ 
gle walled carbon nano tubes in a static physical configura¬ 
tion, i.e. electrical charge change cause dynamics, used and 
exploited for computation within the NACSENCE project 
show electrical observable response from underlying electri¬ 
cal networks exploited by evolution to emerge at the macro¬ 
scopic level. As a second material common kitchen table 
salt, a material that is conductive when mixed with wa¬ 
ter. The crystalline form of salt is for the most part non- 
conductive since the ions are bound up in a crystal lattice 
structure, but by adding a tiny amount of water to the crys¬ 
tals conductivity is achieved. The salt structure is in contrast 
to the CNT material not static. 

Method 

The overall goal of EIM is to find methods and materi¬ 
als that can serve as complex computing systems. Meth¬ 
ods should be capable of exploring and exploiting materials 
toward achieving useful computation, and materials should 
have inherent properties that enable this. The systems are 
physical systems, hence the material will operate in a real 
environment, and in particular, the environment will be part 
of the system. The main method is the bottom-up design 
approach of evolution. In a complexity setting evolution 
is argued to be a process that builds complexity (Holland, 
2012). The building of complexity can also be considered 
as a case where the system learns to program itself, third 
of Brian Arthur’s methods of complexity growth (Arthur, 
1993). Further, the concept of growth of complexity fits 
well into a ( DS) 2 setting; a system that evolve toward more 
complex dynamic behaviour by perturbations from the envi¬ 
ronment and the dynamics of the system itself (Nichele and 
Tufte, 2013). 

An experimental approach is taken to investigate and ex¬ 
plore the relation between EIM and (DS) 2 . Exploiting evo¬ 
lution to configure materials with a behaviour that show in¬ 
duced perturbations to it’s own dynamics. The experimental 
setting is in principle as shown in Figure 1. The experiments 
are designed to unveil the intertwined influence between the 
dynamics of underlying physics the input data and configu- 



Perturbations Readout 


Figure 4: System view of the experiments. The material 
is considered as a (DS) 2 system capable of inducing per¬ 
turbations to it’s own dynamics. The external perturbations 
indicated as an input arrow include input data and configu¬ 
ration signals for the material. The output arrow indicates 
the external observation of the system state. 

ration signals. To put the system in a (DS) 2 setting input 
and configuration signals are considered as external pertur¬ 
bations to the system. Figure 4 illustrates the experimental 
setting at a system level. 

The material in the figure include a set of state transitions. 
The state transitions can be perturbed externally. Each state 
in the figure is observable through the read out signal. If the 
system inhibits (DS) 2 behaviour it should be possible to de¬ 
tect different trajectories (induced perturbations to it’s own 
dynamics) whilst the external perturbation is unchanged, il¬ 
lustrated by dotted arrows. In the figure each state is as in¬ 
dicated one or several internal states (internal state(S)) as to 
illustrate the property of topological reconfigurations of the 
state space. 

In literature on artificial (and biological) evolution (AE) 
many terms are used for breaking down the various parts of 
a search. In a physical and ’real’ system such as the EIM- 
based ones it is not immediately clear where the boundaries 
go between these parts, such as genotype , phenotype and the 
genotype-phenotype map. The definition of these terms is 
context-dependant, and EIM mixes two contexts; the context 
of artificial evolution and the context of physical, real life 
systems. For the purposes of EIM it is sufficient when dis¬ 
cussing these terms to simply note that when we discuss the 
phenotype of a system, we are talking about the observed en¬ 
tity interacting with the physical environment, e.g. the elec¬ 
trical current flowing through the material, and observed as 
state space trajectories, i.e. the observable read out in Figure 
4. Since the material is a part of the environment by virtue 
of being a physical object, there is inevitable interaction be¬ 
tween the material and the environment. The genotype can 
still be treated as is common in AE; the entity operated upon 
by genetic operators such as mutation and crossover. 

The relation between the genotype and phenotype is the 
genotype-phenotype map, This map takes a genotype as in- 
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Figure 3: As an illustration of the experimental results a section of the full state space trajectory for an experimental run on 
salt/water solution with . A cut out in blue highlight state space trajectory changes (presented in Figure 5). 


put and transforms it to a real-world entity with real-world 
physics and in particular includes the interface used to pro¬ 
duce the desired manipulative phenomena, such as digital- 
to-analog converters. It is however worth noting that only 
what is observable to the EA (phenotypic behaviour or envi¬ 
ronmental effects of it’s existance) can be used as input to a 
fitness function. 

The experimental platform 

The experimental results in this paper were achieved us¬ 
ing the Mecobo platform (Lykkebo et al., 2014), a hard¬ 
ware/software implementation of an EIM system. The 
Mecobo platform is shown in Figure 2. 

Figure 2(a) show the overall design. Configuration spec¬ 
ification, i.e. genotypes, are loaded from a PC to Mecobo 
over the USB port. The micro controller communicates with 
the USB interface and with an FPGA on an internal bus. The 
FPGA can interface directly to materials or as in the figure 
use a daughter board to extend the signal range as shown in 
Figure2(b). 

Mecobo is capable of controlling close to 100 individual 
configurable input/output signals (pins) that connects to the 
material. Each signal are described by parameters at a given 
point in time., e.g. recording pin from time 0, output fre¬ 
quency pin from time 0 to 10 or output pin voltage level 
2.7V from time 0 etc., see (Lykkebo et al., 2014) for a de¬ 
tailed presentation of the Mecobo hardware and software. 

The material samples are placed in a material bay, as seen 
in 2(c), pointed to with a red arrow, and connected to the 
Mecobo platform. 

A standard genetic algorithm with tournament-based se¬ 
lection of size 3, a population size of 30 and a mutation 
probability of 0.2 was used. The mutation is drawn from 
a Gaussian distribution with cr = 1 and fi = 0. The genome 
consists of 3 floating point numbers, from 0 to 1 which are 
scaled to integer square wave frequencies by 10 6 during the 
genotype-phenotype mapping process. 

Each individual is run 5 times and stability (i.e. repeata- 



Figure 5: Zoomed region in blue in fig 3, showing the time- 
dependant behaviour of a driven evolved system in salt. 

bility of the measured) states is part of the fitness. One run 
collects T state vectors S(t) in a vector B r ,r G [0,4]. 

The fitness function counts the number of unique states 
measured during one time period T, where unique means 
that for all pairs (ti,t 2 ) G [0, T] S(ti) ^ S(t 2 ) and divides 
the number of unique states by T. Finally, we take the co¬ 
sine between all pairs of B r -vectors and multiply the fitness 
by this number, ensuring that systems who has very similar 
trajectories in state space get awarded a high fitness. 

The material bay has room for 60 connectors, of which we 
used 50 to connect to the Mecobo platform. 3 pins were then 
selected as designated input (to the material) pins; spread 
out in the material, one pin was selected as a current sink 
and the remaining 46 pins were used as material output pins, 
whose digital values were recorded over a time period of 200 
uS at 20KHz. 

The output pins are connected directly to the FPGA input 
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buffers and further directly connected to FPGA-intemal flip- 
flops with a triggering voltage of 1.7V. We define a state as a 
vector of size n S(t) = (oi(£), 02 (f),..., o n (t)) where Oi(t) 
is the value of flip flop i in the FPGA at time t. 

We do not claim that the targeted system does useful 
computation. The purpose of this example is to demon¬ 
strate the existence of potentially complex behaviour in a 
driven physical system, which gives potential for useful 
computations. The chosen trajectory metric for the fitness 
evaluation is mainly chosen to be able to investigate for 
(. DS) 2 behaviour. However the metric is in accordance with 
an abstract measurement of complexity as used by Lang- 
ton (1991), Wolfram (1984) and for developmental systems 
(Kowaliw, 2008; Nichele et al., 2016). 

Material Samples 

The ’material cup’, in fact a Multichannel Systems mi¬ 
cro electrode array (MEA) model (60MEA100/10iR-Ti), 
pointed to in red in Figure 2(c), holds salt crystals formed 
by letting a solution lOmL of water with lmL of kitchen ta¬ 
ble salt dry out, and before each evolutionary run these salt 
crystals are mixed with 50/iL more of water to allow charge 
movement in the crystals. A second such ME A was filled 
with single walled carbon nano tubes (SWCNT) in a PMMA 
polymer solution. The same experiment is run on both. For 
control purposes, a fully conductive carbon plate was used. 

Results and discussion 

The Genetic Algorithm (GA) was used to provide data to be 
analyzed in a (DS) 2 setting. The resulting phenotypes from 
the evolutionary runs were analyzed by examining the state 
space traversal. In all results, (DS) 2 behavior was found. 
Figure 3 shows parts of a full state space traversal on dry 
salt crystals, visualized in graph form. Each node represents 
one state, and each arc between states one state transition. 
Figure 5 shows a zoomed version of the box marked by blue 
in figure 3. Figures 5 shows out-takes of the more interest¬ 
ing dynamic (time-dependent) behaviors in the run. Each 
edge is marked with the time-step (t) that the transition oc¬ 
curred on, along with the input as a tuple of 3 binary values 
(a,b,c). For a given input and a state one could expect that a 
transition should go to the same state, but as we can see for 
instance in state 85 of figure 5 this is not the case: at t=102, 
in=( 1,0,0) there is a transition to state 86, whereas for t=120 
the transition goes to a different state, 99. The red line in 
3 traces out the full path of this branching behavior, turning 
orange at the second pass through state 85. This demon¬ 
strates a time-dependent behavior where the traversal of the 
state space depends on previously seen states. 

The same method and set-up was used on the carbon 
nano tube(CNT) material sample. Time-dependent behavior 
whilst traversing the state space was present and found in 
all runs with the carbon nano tube material sample as well. 



Figure 6: A state graph of a time-evolved carbon nano tubes 
driven state graph. 

Figure 6 shows an out-take with dynamic (time-dependent) 
behaviors from the full state space traversal graph. 

The GA typically achieves a fitness of 0.75 out of a 1.0 
max within 50 generations, meaning that it finds a stable 
behavior that generates roughly 3/4 unique states relative 
to the chosen observation level. This result is achieved in 
all cases tested. The number of transitions that show the 
time-dependent branching behavior discussed in relation to 
figure 6 is typically around 5. The chosen genome most 
often would map to frequencies around 100 KHz., which 
is higher than the Nyquist-rate relative to our sampling fre¬ 
quency of 50KHz, meaning that we cannot fully reconstruct 
the input signal from the sampling rate, however this simply 
underlines our previous points relating to where one sets the 
observation level- we are not concerned with signal recon¬ 
struction, but rather the systems ability to produce behavior 
in the (DS) 2 context. 

This behavior is further shown in figure 7 for the salt crys¬ 
tals, and the plot is similar when using CNT as a material. 
The plot shows periodic behavior interspersed with spuri¬ 
ous stable (in the sense that they meet our stability crite¬ 
ria previously defined), states. We stress that the nature of 
these states can have several reasons, (i.e. metastability in 
the flip-flops) and we cannot fully rule out the possibility 
of the sampling apparatus ’interfering’ with the dynamics of 
i.e. the stimulus of the salt crystals, however runs with a 
fully conductive carbon plate as material gives us no such 
observed dynamics. 

The vertical axis on this figure indicates the time as the 
voltage is applied to the salt crystals. On the left hand side 
we see the input to the system as it is captured by the same 
method that captures the rest of the state data- it is of a more 
regular nature (though some sampling artifacts). 

The complexity of the input compared to the complexity 
of the output is currently under investigation, however for 
the purposes of this article the fact that the number of ob¬ 
servable states (2 3 ) is much lower than the number of poten¬ 
tial observable output states (2 46 ) demonstrates that there is 
potential for underlying (DS) 2 dynamics. Input is also ver¬ 
ified as ’stable’ in the same sense as output in that it is also 
based on a number of repeats of the same run and compared. 

Conclusion 

Analyzing the behavior of the evolved systems in a (.Das') 2 
setting show that both material samples explored show the 
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property of inducing perturbations to their own dynamics as 
a function of their system states. In the state space traversal 
graph this behavior is visualized and show that the underly¬ 
ing dynamics of the materials enables changes in the trajec¬ 
tory in the state space. For the CNT sample the change in 
trajectory is a product of electrical effects that emerges as 
observable ( DS) 2 traversal of the state space. For the salt 
crystal/water sample topological reconfigurations are possi¬ 
ble but not detectable directly. The chosen observation level 
is here also only based on sampled voltages, a result of cur¬ 
rents and charge in the material. 

In the experiment dynamics was explicitly targeted, in 
contrast to previous EIM work on similar CNT materials 
((Clegg et al., 2014; Mohid et al., 2014)), as the results show 
the hybrid approach with a digital read out dynamic behav¬ 
ior are present. The presence and possibility to observe such 
behavior can expand the computational range of a relative 
simple EIM-set-up from problems requiring only feed for¬ 
ward networks, e.g. (Clegg et al., 2014), to computational 
tasks requiring memory. 

The (DS) 2 behavior at our chosen observation level show 
that it is not given that the system will behave predictably in 
the sense that a given input and given state will produce the 
same output- the current state might look the same, but ac¬ 
tually be a result of a different underlying dynamic process. 
This does not imply that it is useless to use this system as 
a basis for building computational systems or similar, but 
rather that care must be taken when choosing and applying 
stimulus to the system and also when we evaluate the output 
by using a fitness function in a artificial evolution-approach. 
We suggest allowing the computational system to ’unfold’ 
over time, treating the apparent weakness of unpredictabil¬ 
ity more as a strength. A way of doing this would be to not 
consider the system using reductionism and divide it’s func¬ 
tionality into smaller parts (i.e. individual gates), but rather 
consider the system as a whole as a ’basic component’ and 
it’s dynamic properties a way of achieving computation. 
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Abstract 

A common idiom in biology education states, “Eyes in the 
front, the animal hunts. Eyes on the side, the animal hides.” 
In this paper, we explore one possible explanation for why 
predators tend to have forward-facing, high-acuity visual sys¬ 
tems. We do so using an agent-based computational model 
of evolution, where predators and prey interact and adapt 
their behavior and morphology to one another over succes¬ 
sive generations of evolution. In this model, we observe a 
coevolutionary cycle between prey swarming behavior and 
the predator’s visual system, where the predator and prey 
continually adapt their visual system and behavior, respec¬ 
tively, over evolutionary time in reaction to one another due 
to the well-known “predator confusion effect.” Furthermore, 
we provide evidence that the predator visual system is what 
drives this coevolutionary cycle, and suggest that the cycle 
could be closed if the predator evolves a hybrid visual sys¬ 
tem capable of narrow, high-acuity vision for tracking prey 
as well as broad, coarse vision for prey discovery. Thus, the 
conflicting demands imposed on a predator’s visual system by 
the predator confusion effect could have led to the evolution 
of complex eyes in many predators. 


Keywords: swarming behavior , predator confusion effect , 
predator-prey coevolution , visual acuity 

Introduction 

“Eyes in the front, the animal hunts. Eyes on the side, the 
animal hides.” So goes the common idiom in biology edu¬ 
cation when teaching students how to classify animal skulls. 
It is widely believed that forward-facing, high-acuity visual 
systems play an important role in predation, for example, 
in dragonflies catching flying prey (Olberg, 2012). Despite 
this common observation, we have little empirical evidence 
explaining the evolutionary history of these focused visual 
systems observed in so many predators. In this paper, we ex¬ 
plore one hypothesis for why predators tend to evolve com¬ 
plex visual systems: the conflicting demands imposed on a 
predator’s visual system by the well-known “predator confu¬ 
sion effect” could have led to the evolution of complex eyes 
in many predators. 


In previous work, we have shown that not all prey evolve 
to respond to the presence of predators by hiding or flee¬ 
ing (Olson et al., 2013a,b; Haley et al., 2014, 2015; Olson 
et al., 2015, 2016). In fact, some prey species have evolved 
to stay together, form swarms, and defend themselves as 
a group for a variety of hypothesized reasons (Krause and 
Ruxton, 2002). For example, swarming is hypothesized to 
improve group vigilance (Pulliam, 1973; Treisman, 1975; 
Kenward, 1978; Treherne and Foster, 1981), reduce the 
chance of being encountered by predators (Treisman, 1975; 
Inman and Krebs, 1987), dilute an individual’s risk of be¬ 
ing attacked (Hamilton, 1971; Foster and Treherne, 1981; 
Treherne and Foster, 1982), and reduce predator attack effi¬ 
ciency by confusing the predator, i.e., the predator confusion 
effect (Jeschke and Tollrian, 2007; Ioannou et al., 2008). 
As such, swarming opens the possibility for an evolution¬ 
ary “arms race” between predators and their prey (Vermeij, 
1987), where the predator and prey continually adapt to one 
another over many generations of evolution. 

Here we use an agent-based computational model of evo¬ 
lution to study the coevolutionary dynamics between preda¬ 
tor and prey (Olson, 2015). We implement the predator con¬ 
fusion effect as a simple perceptual constraint on the preda¬ 
tor’s visual system, and allow both the predator and prey be¬ 
havior to coevolve over successive generations of evolution 
(as in Olson et al. 2013a). We further extend this work to 
allow the predator visual system to simultaneously evolve, 
which enables us to explore how the predator visual sys¬ 
tem adapts in response to the prey behavior. From these ex¬ 
periments, we discover a coevolutionary cycle between prey 
swarming behavior and the predator’s visual system. From 
further analysis, we discover that the predator visual sys¬ 
tem is the primary driver of this cycle: When the predator 
evolves a focused visual system, the prey evolve to disperse; 
whereas when the predator evolves a broad visual system, 
the prey evolve to swarm. Thus, we suggest that there is a 
selective advantage for predators that evolve a complex vi¬ 
sual system capable of both narrow, high-acuity vision for 
tracking prey as well as broad, coarse vision for prey dis¬ 
covery. 
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Methods 


To study the coevolutionary cycle between predator and 
prey, we create an agent-based model in which predator and 
prey agents interact in a continuous two-dimensional virtual 
environment. Each agent is controlled by a Markov Net¬ 
work (MN), which is a stochastic state machine that makes 
control decisions based on a combination of sensory inputs 
(i.e., vision) and internal states (i.e., memory) (Edlund et al., 
2011). We coevolve the MNs of predators and prey with a 
genetic algorithm, selecting for MNs that exhibit behaviors 
that are more effective at consuming prey and surviving, re¬ 
spectively. Certain properties of the sensory and motor be¬ 
havior of predators and prey are implemented as constraints 
that model some of the differences between predators and 
prey observed in nature (e.g., relative movement speed, turn¬ 
ing agility, and, for predators, maximum consumption rate). 
Predator confusion, described in more detail below, is im¬ 
plemented as a constraint on predator perception that can be 
varied experimentally. The source code 1 for these experi¬ 
ments is available online. In the remainder of this section, 
we summarize the evolutionary process that enables the co¬ 
evolution of predator and prey behavior and visual systems, 
describe the sensory-motor architecture of individual agents, 
then present the characteristics of the environment in which 
predator and prey interact. A detailed description of MNs 
and how they are evolved can be found in Olson et al. (2016). 

Coevolution of predator and prey 

We coevolve the predator and prey with a genetic algo¬ 
rithm (GA), which is a computational model of evolution 
by natural selection (Goldberg, 1989). In a GA, pools of 
genomes are evolved over time by evaluating the fitness of 
each genome at each generation and preferentially select¬ 
ing those with higher fitness to populate the next generation. 
The genomes here are variable-length lists of integers that 
are translated into MNs during fitness evaluation. Further¬ 
more, we allow the predator visual system to evolve by at¬ 
taching a single integer value to each predator genome that 
controls the predator view angle , i.e., the size of the arc that 
the predator’s visual system covers (see Figure 1). 

The coevolutionary process operates as follows. First, 
we create separate genome pools for the predator and prey 
genomes. Next, we evaluate the genomes’ fitness by select¬ 
ing pairs of predator and prey genomes at random without 
replacement, then place each pair into a simulation envi¬ 
ronment and evaluate them for 2,000 simulation time steps. 
Within this simulation environment, we generate 50 iden¬ 
tical prey agents from the single prey genome and compete 
them with the single predator agent to obtain their respective 
fitness. This evaluation period is akin to the agents’ lifespan, 
hence each agent has a potential lifespan of 2,000 time steps. 
The fitness values, calculated using the fitness function de- 

1 Model code: https://github.com/adamilab/eos 



Figure 1: An illustration of the predator and prey agents in 
the model. Fight grey triangles are prey agents and the dark 
grey triangle is a predator agent. The prey agents have a 
180° limited-distance visual system (100 virtual meters) to 
observe their surroundings and detect the presence of the 
predator and prey agents, whereas the predator agents have 
a variable-sized visual system that can see for 200 virtual 
meters. “PF” and “PR” correspond to the sensors just to 
the left and right of the agent’s heading, respectively. Each 
agent has its own Markov Network, which decides where to 
move next based off of a combination of sensory inputs and 
memory. The left and right actuators (labeled “L” and “R”) 
enable the agents to move forward, left, and right in discrete 
steps. 


scribed below, are used to determine the next generation of 
the respective genome pools. At the end of the lifetime sim¬ 
ulation, we assign the predator and prey genomes separate 
fitness values according to the fitness functions: 


2,000 


^predator = ^ ^ $ Af 

(i) 

t= 1 


2,000 


W prey = ^ ] Af 
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t =l 


where t is the current simulation time step, S is the starting 
swarm size (here, S = 50), and A t is the number of prey 
agents alive at simulation time step t. It can be shown that 
the predator fitness (Eq. 1) is proportional to the mean kill 
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rate k (mean number of prey consumed per time step), while 
the prey fitness (Eq. 2) is proportional to (1 — k). Thus, 
predators are awarded higher fitness for capturing more prey 
faster, and prey in turn are rewarded for surviving longer. 
We only simulate a portion of the prey’s lifespan where they 
are under predation because we are investigating swarming 
as a response to predation, rather than a feeding or mating 
behavior. 

In this case, we use a GA with a population size of 100 
(100 prey, 100 predators), per-gene mutation rate of 1%, 
gene duplication rate of 5%, gene deletion rate of 2%, and 
mutation rate of 5% for the predator visual system that adds 
a number in the range [-50°, 50° ] to the arc size while keep¬ 
ing it constrained between [1°, 360°]. 

Once we evaluate all of the predator-prey genome pairs 
in a generation, we perform fitness-proportionate selection 
on the populations via a Moran process (Moran, 1962), al¬ 
low the selected genomes to asexually reproduce into the 
next generation’s populations, apply random mutations to 
the newborn offspring, increment the generation counter, 
and repeat the evaluation process on the new populations 
until the final generation (25,000) is reached. 

We perform 30 replicates of each experiment, where for 
each replicate we seed the prey population with a set of 
randomly-generated MNs and the predator population with 
a pre-evolved predator MN that exhibits rudimentary prey¬ 
tracking behavior with a 180° visual system. Seeding the 
predator population in this manner only serves to speed up 
the coevolutionary process, and has negligible effects on the 
outcome of the experiment (Figure SI from Olson et al. 
2013a). 

Predator and prey agents 

Figure 1 depicts the sensory-motor architecture of predator 
and prey agents in this system. The retina sensors of both 
predator and prey agents are logically organized into “lay¬ 
ers,” where a layer includes 12 sensors, with each sensor 
having a field of view of 15° and a range of 100 virtual me¬ 
ters. Moreover, each layer is attuned to sensing a specific 
type of agent. Specifically, the predator agents have a single¬ 
layer retina that is only capable of sensing prey. In contrast, 
the prey agents have a dual-layer retina, where one layer is 
able to sense conspecifics, and the other senses the predator. 
(We note that there is only a single predator active during 
each simulation, hence the lack of a predator-sensing retinal 
layer for the predator agent.) 

Regardless of the number of agents present in a single 
retina slice, the agents only know the agent type(s) that 
reside within that slice, but not how many, representing 
the wide, relatively coarse-grain visual systems typical in 
swarming birds such as Starlings (Martin, 1986). For exam¬ 
ple in Figure 1, the fourth retina slice to the right (labeled 
“A”) has one prey (light grey triangle) and two predators 
(dark grey triangles) in it, so both the predator and prey sen¬ 


sors activate and inform the MN that one or more predators 
and one or more prey are currently in that slice. Further¬ 
more, since the prey near the seventh retina slice from the 
left is just outside the range of the retina slice, the prey sen¬ 
sor for that slice does not activate. We note that although the 
agent’s sensors do not report the number of agents present 
in a single retina slice, this constraint does not preclude the 
agent’s MN from evolving and making use of a counting 
mechanism which reports the number of agents present in 
a set of retina slices. Once provided with its sensory infor¬ 
mation, the prey agent chooses one of four discrete actions: 
(1) stay still; (2) move forward 1 unit; (3) turn left 8° while 
moving forward 1 unit; or (4) turn right 8° while moving 
forward 1 unit. 

Fikewise, the predator agent detects nearby prey agents 
using a limited-distance (200 virtual meters), segmented 
retina covering an evolvable angle in front of the predator 
that functions just like the prey agent’s retina. Similar to the 
prey agents, predator agents make decisions about where to 
move next, but the predator agents move 3x faster than the 
prey agents and turn correspondingly slower (6° per simula¬ 
tion time step) due to their higher speed. 

Simulation environment 

We use a simulation environment to evaluate the relative per¬ 
formance of the predator and prey agents. At the beginning 
of every simulation, we place a single predator agent and 50 
prey agents at random locations inside a closed 512 x 512 
unit two-dimensional simulation environment. Each of the 
50 prey agents are controlled by clonal MNs of the particu¬ 
lar prey MN being evaluated. We evaluate the swarm with 
clonal MNs to eliminate any possible effects of selection at 
the individual level, e.g., the “selfish herd” effect (Wood and 
Ackland, 2007; Olson et al., 2013b). 

During each simulation time step, we provide all agents 
their sensory input, update their MN, then allow the MN to 
make a decision about where to move next. When the preda¬ 
tor agent moves within 5 virtual meters of a prey agent it can 
see (i.e., the prey agent is anywhere within the predator’s vi¬ 
sual field), it automatically makes an attack attempt on that 
prey agent. If the attack attempt is successful, the target 
prey agent is removed from the simulation and marked as 
consumed. Predator agents are limited to one attack attempt 
every 10 simulation time steps, which is called the handling 
time. The handling time represents the time it takes to con¬ 
sume and digest a prey after successful prey capture, or the 
time it takes to refocus on another prey in the case of an un¬ 
successful attack attempt. Shorter handling times have neg¬ 
ligible effects on the outcome of the experiment, except for 
when there is no handling time at all (Figure S2 from Olson 
etal. (2013a)). 

To investigate predator confusion as an indirect selection 
pressure driving the coevolution of swarming, we imple¬ 
ment a perceptual constraint on the predator agent. When 
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Results 



Figure 2: Relation of predator attack efficiency (# success¬ 
ful attacks / total # attacks) to number of prey. The solid 
line with triangles indicates predator attack efficiency as a 
function of the number of prey within the visual field of the 
predator (Anv)- Similarly, the dashed line with error bars 
shows the actual predator attack efficiency given the preda¬ 
tor attacks a group of swarming prey of a given size, using 
the Anv curve to determine the per-attack predator attack 
success rate. Error bars indicate two standard errors over 
100 replicate experiments. 

the predator confusion mechanism is active, the predator 
agent’s chance of successfully capturing its target prey agent 
(^capture) is diminished when any prey agents near the tar¬ 
get prey agent are visible anywhere in the predator’s visual 
field. This perceptual constraint is similar to previous mod¬ 
els of predator confusion based on observations from nat¬ 
ural predator-prey systems (M. Jeschke and Tollrian, 2005; 
Jeschke and Tollrian, 2007; Ioannou et al., 2008), where the 
predator’s attack efficiency (# successful attacks / total # at¬ 
tacks) is reduced when attacking swarms of higher density. 

-Pcapture is determined by the equation P C a P ture = 3^7, 
where Anv is the number of prey agents that are visible to 
the predator, i.e., anywhere in the predator agent’s visual 
field, and within 30 virtual meters of the target prey. By 
only counting prey near the target prey, this mechanism lo¬ 
calizes the predator confusion effect to the predator’s retina, 
and enables us to experimentally control the strength of the 
predator confusion effect. 

Although our predator confusion model is based on the 
predator’s retina, it is functionally equivalent to previous 
models that are based on the total swarm size (Figure 2, 
dashed line), see, e.g., (M. Jeschke and Tollrian, 2005; 

Tosh et al., 2006; Jeschke and Tollrian, 2007; Ioannou et al., 
2008). As shown in Figure 2 (solid line with triangles), the 
predator has a 50% chance of capturing a prey with one vis¬ 
ible prey near the target prey (Anv = 2), a 33% chance of 
capturing a prey with two visible prey near the target prey 
(Anv = 3), etc. As a consequence, prey are in principle 
able to exploit the combined effects of predator confusion 
and handling time by swarming. 


To evaluate the evolved prey behavior quantitatively, we ob¬ 
tain the line of descent (LOD) for every replicate experi¬ 
ment by tracing the ancestors of the most-fit prey MNs in 
the final population until we reach the randomly-generated 
ancestral MN with which the starting population was seeded 
(see Lenski et al. 2003 for an introduction to the concept 
of a LOD in the context of computational evolution). For 
each ancestor in the LOD, we characterized the behavior 
with a common behavior measurement called swarm den¬ 
sity (Huepe and Aldana, 2008). We measured the swarm 
density as the mean number of prey within 30 virtual meters 
of each other over a lifespan of 2,000 simulation time steps, 
which provides an indication of how closely the prey are 
staying near each other on average. Similarly, we evaluate 
the predator’s view angle by tracing the LOD of the most-fit 
predator and observing the view angle of each ancestor. 

In Olson et al. 2013a, we showed that when the predator’s 
visual system only covered the frontal 60° or less, swarming 
to confuse the predator was no longer a viable adaptation 
(as indicated by a mean swarm density of 0.68 db 0.02 at 
generation 1,200). In this case, the predator had such a nar¬ 
row view angle that few swarming prey were visible during 
an attack, which minimizes the confusion effect and corre¬ 
spondingly increases the predator’s capture rate (Figure S8 
from Olson et al. 2013a). When the predator’s visual sys¬ 
tem was incrementally modified to cover the frontal 120° 
and beyond, swarming again became an effective adaptation 
against the predator due to the confusion effect (indicated by 
a mean swarm density of 6.13 =b 0.76 at generation 1,200). 
This suggests that the predator confusion mechanism may 
not only provide a selective pressure for the prey to swarm, 
but it could also provide a selective pressure for the predator 
to narrow its view angle to become less easily confused. 

When we allow the predator view angle to coevolve along 
with the predator and prey behavior, we observe that the 
predator populations do indeed evolve focused visual sys¬ 
tems in response to prey swarming behavior (Figure 3), as 
indicated by the predator view angle evolving to < 100° 
once the prey begin to swarm. Interestingly, the predator and 
prey populations appear to repeatedly cycle between differ¬ 
ent states of view angles and behaviors, respectively, such 
that there is a significant negative correlation between the 
predator view angle and swarm density across all 30 coevo¬ 
lution experiments (Figure 4). This finding is surprising be¬ 
cause the predator population should be able to effectively 
“defeat” the swarming prey population by shrinking their vi¬ 
sual system to the point that the prey will no longer evolve to 
swarm. Why then would the predator population evolve to 
widen their visual system once the prey evolve to disperse, 
and allow the prey population to again evolve swarming be¬ 
havior to reduce the predators’ attack efficiency? 

Shown in Figure 5, when predators with fixed view an¬ 
gles are competed against dispersive prey, the predators with 
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Figure 3: Swarm density and predator view angle from the LOD of a single coevolution experiment. The predator and prey 
populations repeatedly cycle between different states of view angles and behaviors. 
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Figure 4: Pearson’s r between swarm density and predator 
view angle from the LODs of 30 coevolution experiments. 
All coevolution experiments have a negative correlation be¬ 
tween swarm density and predator view angle, indicating 
that when swarm density goes up, predator view angle goes 
down, and vice versa. P <= 0.001 for all correlations. 



Figure 5: Number of simulation time steps that prey are 
present anywhere in an evolved predator’s visual system de¬ 
pending on the predator’s view angle. Each box plot repre¬ 
sents 30 replicates, and the notches represent the 95% confi¬ 
dence interval around the median. Here, the predator is com¬ 
peted against dispersive prey. Predators with higher view 
angles are more likely to have prey anywhere in their visual 
system at a given time. P <= 0.001 between all view an¬ 
gles with a Kruskal-Wallis multiple comparison. 


254 


























PWiitQf VifwAngl* 


Figure 6: Number of simulation time steps that prey are vis¬ 
ible in a portion of an evolved predator’s visual system that 
it pays attention to, depending on the predator’s view angle. 
Each box plot represents 30 replicates, and the notches rep¬ 
resent the 95% confidence interval around the median. Here, 
the predator is competed against dispersive prey. Preda¬ 
tors with higher view angles are more likely to spot prey 
at a given time, which increases their foraging efficiency. 
P <= 0.001 between all view angles except 180 vs. 210 
with a Kruskal-Wallis multiple comparison. 
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Figure 7: Fitness of an evolved predator when competed 
against dispersive prey, depending on the predator’s view an¬ 
gle. Each box plot represents 30 replicates, and the notches 
represent the 95% confidence interval around the median. 
Predators with higher view angles forage for prey more ef¬ 
ficiently, thus capturing more prey in their lifetime and im¬ 
proving their fitness. P <= 0.001 between all view angles 
except 150 vs. 180 and 180 vs. 210, Kruskal-Wallis multiple 
comparison. 
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Figure 8: Diagram depicting the observed coevolutionary 
cycle between the predator and prey in the presence of the 
predator confusion effect. 

broader visual systems are more likely to find a prey any¬ 
where in their visual system at any time. Further, Figure 6 
demonstrates that predators with broader visual systems are 
also more likely to find dispersive prey in a portion of their 
visual system that they pay attention to, which means they 
spend less time searching for prey. Thus, the increased 
foraging efficiency that broader visual systems offer preda¬ 
tors against dispersive prey results in higher predator fitness 
(Figure 7), which explains why predators evolve higher view 
angles in the presence of dispersive prey. 

Discussion 

As demonstrated in Figure 3, selection favors predators with 
a more focused visual system once swarming has evolved in 
prey. However, once the predators evolve a focused visual 
system, the prey evolve dispersive behavior in response and 
a coevolutionary cycle commences between the predator vi¬ 
sual system and prey behavior. Some time ago, researchers 
commonly assumed that the evolution of social behavior is a 
one-way street, and that once social integration has arisen 
in a population it must be so advantageous (compared to 
the cost of living in close proximity to conspecifics) that it 
would not be lost (Wcislo and Danforth, 1997). Our findings 
demonstrate that at least one form of social behavior—the 
tendency to form cohesive swarms—can readily be gained 
and lost, and that the gain and loss is governed by a coevo¬ 
lutionary cycle that could occur between natural predators 
and prey due to the predator confusion effect, as depicted in 
Figure 8. 

Furthermore, the findings in this paper highlight a trade¬ 
off that natural predators likely experience when hunting for 
prey: Broader, less-focused visual systems are more useful 
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for initially spotting prey, but focused visual systems are bet¬ 
ter adapted for tracking an individual prey down and avoid¬ 
ing the effects of predator confusion when hunting prey in 
groups. Thus, these conflicting demands imposed on the 
predator’s visual system by the predator confusion effect 
could select for the evolution of complex eyes that satisfy 
both needs. Indeed, many animals—both vertebrate and 
invertebrates—do have such complexity in the arrangement 
of their retinae, including the presence of a fovea in ver¬ 
tebrates (Moore et al., 2012) or “acute zones” in inverte¬ 
brates (Land, 1997). Our system could not have evolved 
such complexity because the retinal slices could not vary in¬ 
dependently. 

In future work, we plan to implement a more advanced 
predator visual system that will allow the number of retina 
slices to vary, and allow each individual slice to vary in size. 
Through such a visual system, evolution will be capable of 
adjusting each retina slice as needed and allow us to explore 
under what conditions complex eyes will evolve. Another 
interesting approach would be to allow the prey visual sys¬ 
tem to coevolve as well in order to explore how evolution 
shapes prey visual systems in response to predation. 

Conclusions 

In this paper, we implemented a computational model of 
evolution that allowed us to explore the coevolution of 
predator and prey morphology and behavior. In particular, 
we explored the coevolution of the predator’s visual system 
and prey behavior and discovered that a repeated coevolu¬ 
tionary cycle occurs when we introduce the predator con¬ 
fusion effect. Furthermore, we provided evidence that the 
predator visual system is what drives this coevolutionary cy¬ 
cle, and suggested that the cycle could be closed if the preda¬ 
tor evolves a hybrid visual system capable of narrow, high- 
acuity vision for tracking prey as well as broad, coarse vision 
for prey discovery. Thus, the conflicting demands imposed 
on a predator’s visual system by the predator confusion ef¬ 
fect could have led to the evolution of complex eyes in many 
predators. 
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Abstract 

Previous research has demonstrated that computational mod¬ 
els of Gene Regulatory Networks (GRNs) can adapt so as to 
increase their evolvability , where evolvability is defined as 
a population’s responsiveness to environmental change. In 
such previous work, phenotypes have been represented as bit 
strings formed by concatenating the activations of the GRN 
after simulation. This research is an extension where pre¬ 
vious results supporting the evolvability of GRNs are repli¬ 
cated, however, the phenotype space is enriched with time 
and space dynamics with an evolutionary robotics task en¬ 
vironment. It was found that a GRN encoding used in the 
evolution of a way-point navigation behavior in a fluctuating 
environment results in (robot controller) populations becom¬ 
ing significantly more responsive (evolvable) over time. This 
is as compared to a direct encoding of controllers which was 
unable to improve it’s evolvability in the same task environ¬ 
ment. 

Introduction 

An open question in artificial and natural life is whether 
digital and natural organisms undergoing an evolutionary 
process are able to become more responsive to changes in 
their environment, that is to become more evolvable (Wag¬ 
ner and Altenberg, 1996a). A prevailing hypothesis is that 
if the environment sufficiently varies over time, then organ¬ 
isms evolve the ability to be able to evolve suitable adap¬ 
tations to such environmental changes faster (Wagner and 
Altenberg, 1996a; Draghi and Wagner, 2008). Crombach 
and Hogeweg (2008) as well as Draghi and Wagner (2008) 
have demonstrated that computational models of Gene Reg¬ 
ulatory Networks (GRNs) exhibit such evolvability. This 
study’s main goal is to replicate the results of this previous 
research (Crombach and Hogeweg, 2008; Draghi and Wag¬ 
ner, 2008), but in the context of evolutionary robotics (Nolfi 
and Floreano, 2000) experiments that test robot controller 
(behavior) evolution in environments where the goal tasks 
vary over time. 

The representation problem in Evolutionary Computation 
(EC) (Eiben and Smith, 2003) addresses the issue of how 
to represent and adapt (mutate and recombine) genotypes 
such that a broad range of complex solutions are repre¬ 
sented by relatively simple genotype encodings (Wagner and 


Altenberg, 1996b). Representation choice and associated 
operators has a significant impact on the evolution of vi¬ 
able solutions and representations which facilitate evolution 
have been termed evolvable (Wagner and Altenberg, 1996b; 
Rothlauf, 2006a; Simoes et al., 2014). Similarly, in nature, 
genetic information defining the form and function of an or¬ 
ganism is stored within its genotype, however the develop¬ 
mental process which translates this information into phe¬ 
notypes is not well understood (Pigliucci, 2010). It has be¬ 
come clear that the mapping between genotype and pheno¬ 
type is neither one-to-one nor linear (Gjuvsland et al., 2013). 
In many organisms, including the case of Ribonucleic acid 
(RNA) folding (Draper, 1992), it has been found that many 
genotypes can code for a single phenotype and that genetic 
change resulting from mutation is not proportional to phe¬ 
notypic change (Pigliucci, 2010; Parter et al., 2008). 

In EC, this is known as a developmental or generative (in¬ 
direct encoding) genotype representation (Stanley and Mi- 
ikkulainen, 2003), where effects of mutations are not only 
determined by representation and associated mutation op¬ 
erators, but also by the population’s position in genotype 
space. This is distinct from a one-to-one mapping (direct 
encoding) where, for a given phenotype and its associated 
genotype, mutational effects are determined by the repre¬ 
sentation, mutation operators and fitness function. Thus 
the population’s location (genotype values) in the genotype 
space can be viewed as an integral component of represen¬ 
tation (Rothlauf, 2006b). 

An open question in biology is whether developmental 
representations have occurred by chance, or if such repre¬ 
sentations have also been subject to evolution (Parter et al., 
2008). A current hypothesis is that an organism’s geno¬ 
type representation is itself evolvable due to the evolution 
of evolvability (Wagner and Altenberg, 1996b), (Pigliucci, 
2008). However, this is complicated by multiple defini¬ 
tions of evolvability 1 in both evolutionary biology(Pigliucci, 


'For a review of evolvability in biology, the reader is referred 
to Pigliucci (Pigliucci, 2008). 
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2008), (Pigliucci, 2010; Parter et al., 2008) and EC (Tara- 
pore and Mouret, 2014), for example, evolvability can re¬ 
fer to either populations or individuals (Wilder and Stanley, 
2015). Similarly, within EC, numerous definitions and asso¬ 
ciated metrics have been proposed. For example, those that 
focus exclusively on solution fitness (Grefenstette, 1999) or 
variability of offspring (Lehman and Stanley, 2013). Tara- 
pore and Mouret (Tarapore and Mouret, 2014) developed a 
metric which incorporated both the fitness and diversity of 
offspring. 

The principle aim of this work is to extend the demonstra¬ 
tion of evolvability in GRN’s by Crombach and Hogeweg 
(2008) as well as Draghi and Wagner (2008) to an evolu¬ 
tionary robotics domain. In these previous studies a popu¬ 
lation’s evolvability was defined as its responsiveness , that 
is, the population’s ability to rapidly adapt to changes in 
the fitness landscape. This work maintains consistency with 
this definition. Hence, we define evolvability to be tanta¬ 
mount to a population’s adaptability (Kirschner and Gerhart, 
1998). This implies that we do not predefine sets of features 
that will likely propagate beneficial phenotypes (behaviors) 
in the evolutionary process. Rather, in line with biological 
literature (Pigliucci, 2008; Flatt, 2005), evolvability is an or¬ 
ganism’s (genotype’s) capability to adapt and survive in its 
environment. 

We investigate evolvability in the context of an evolution¬ 
ary robotics task, where robot controller (behavior) evolu¬ 
tion is tested using both indirect (GRN) and direct genotype 
encodings. The hypothesis is that an indirect encoding facil¬ 
itates the improvement of responsiveness over the course of 
evolution in a robotics task domain with changing task envi¬ 
ronments, whereas indirect encoding does not. Here we use 
responsiveness and evolvability interchangeably to mean the 
speed with which a population adapts given task environ¬ 
ment changes. The robotics task was way-point navigation, 
where responsiveness was tested via having the environment 
fluctuate with its own task variants. In order to facilitate this, 
two different way-point layouts were used. 

Results indicated that evolution using the indirect (GRN) 
encoding facilitated the evolution of controllers that were 
significantly more responsive (adapted) to task environment 
fluctuations over evolutionary time. Comparatively, evolu¬ 
tion using the direct (bit-string) encoding applied to this task 
indicated that evolved behaviors were unresponsive and un¬ 
able to appropriately adapt to task environment variations 
overtime. Methods 

Simulation and Task Environment 

The evolutionary robotics simulation used a bounded two- 
dimensional continuous environment 2 , where the environ¬ 
ment was imposed with a 400 x 200 grid for ease of speci¬ 
fying task environment and simulation parameters (table 1). 

2 An extension of the RoboRobo simulator (Bredeche et al., 
2013) was used for all experiments. 


The task tested was way-point navigation. During controller 
evolution, the task variants were switched in order to simu¬ 
late a fluctuating environment (Evolutionary Algorithm Sec¬ 
tion). Two task variants were specified, each requiring the 
robot to pass by (within a given distance, table 1) of a pre¬ 
specified number of way-points. The task required the robot 
to pass the way-points in a specific order and within its life¬ 
time (a given number of simulation time steps, table 1). The 
number of way-points the robot passed by during its lifetime 
equalled its fitness. 

Figure 1 shows the layout of the way-points for each of 
the two task variants, where the way-points were specially 
positioned to encourage the emergence of a wall-following 
behavior. 


Robot Controller 

The robot controller was a fully connected feed-forward 
Artificial Neural Network (ANN) with twelve connection 
weights. That is, three hidden layer neurons (Sigmoidal 
units), connected to two sensory input and two motor out¬ 
put neurons. The two sensory inputs were distance sensors, 
each placed ir/3 radians on either side of the direction in 
which the robot was facing. These sensors operated simi¬ 
lar to Infrared proximity sensors, ray casting in the sensor’s 
field of view. If this line intersected a wall in the sensor’s 
range, then the sensor’s reading was d/r, where d was the 
distance to the wall and r was the sensor’s range. If there 
was no wall in sensor range, then the sensor reading was 
1.0. The controller’s motor outputs determined the robot’s 
speed and heading, where outputs were normalized in the 
range [0.0,1.0] and corresponded to minimum and maxi¬ 
mum speed and heading values (table 1). 

Gene Regulatory Network 

The Gene Regulatory Network (GRN) model for robot con¬ 
troller encoding is based on that used in previous related 
work (Crombach and Hogeweg, 2008; Draghi and Wagner, 
2008). Nodes in the GRN are genes, and connections be¬ 
tween the nodes are either excitatory or inhibitory. All nodes 
are updated synchronously via equation 1. 



T. j w ij Sj (f) ^ 

ZjWijSjit) — 


( 1 ) 


Where, Si ( t ) is the activation of the Ah node at simulation 
iteration £, is the connection weight of the directed edge 
from the Ah to the j th node, and Oi is the threshold of the 
Ah node. If no such connection exists then w^ = 0. Table 2 
presents the GRN parameters. In order to facilitate the con¬ 
version of activations into bit-strings all nodes were given 
a unique value in the range [0, Z], where l is one less than 
the number of nodes (Gene Regulatory Network Encoding 
section). 
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Figure 1: Visualization of the way-point navigation task for each of the two task variants (left and right). Way-points (brown 
dots) line the top-right and top-left corners of each task variant. The circles surrounding each way-point represent the distance 
within which the robot must pass the way-point. The robot is presented as the blue dot on either side (far left and far right) with 
two lines extending (representing sensor fields of view). 


Mutation Operators. Table 3 specifies the mutation op¬ 
erators used in the Evolutionary Algorithm (EA) applied to 
evolve the GRN. The mutate_weight, add_edge, and 
delete_edge operators were applied to the GRN node 
connections, where for every connection, the operator was 
applied with a given probability (table 4). All other mutation 
operators acted on nodes activation and threshold functions 
with a given probability (table 4). 

Binary (Direct) Encoding. A direct mapping function 
was used to map the binary genotype encoding to an ANN 
controller. To convert the sixty element binary string geno¬ 
type to the twelve connection weight values which specify 
an ANN controller, the genotype string was split into twelve 
smaller strings of five elements each. These twelve strings 
were then converted into real numbers in the range [0.0,1.0] 
using equation 2. 


2 


[ S*e{o...4} a 

y 2^1 



Where, is the ith element of substring a. 


( 2 ) 


Gene Regulatory Network (Indirect) Encoding. Using 
methods from previous work (Crombach and Hogeweg, 
2008; Draghi and Wagner, 2008), the GRN was simulated 
for a given number of iterations (table 2). During this simu¬ 
lation time, convergence to a point attractor was tested for by 
determining whether node activations in the last and penulti¬ 
mate iterations were identical. If the GRN did not settle on a 
point attractor then it was marked for removal and no further 
evaluation took place. Preliminary testing indicated that the 
removal of these GRNs had a negligible effect on the evo¬ 
lutionary dynamics in this study’s experiments. If the GRN 
settled on a point attractor then the node activations were 
converted into a bit string, where the ordering of the activa¬ 
tions was determined by unique node identifiers (Gene Reg¬ 
ulatory Network section). This GRN convergence test was 
done to maintain consistency with previous work (Crom¬ 
bach and Hogeweg, 2008), (Draghi and Wagner, 2008). Bit- 


strings were then converted into an ANN controller using the 
genotype to ANN direct encoding mapping method (Binary 
Encoding section, equation 2). 

Evolutionary Algorithm (EA) 

The binary and GRN encoded genotypes were evolved with 
an EA using deterministic tournament selection (Eiben and 
Smith, 2003), applied 200 times per generation. Also, the 
EA used mutation only (there was no recombination oper¬ 
ator). Table 4 presents the EA parameters. The following 
subsections detail the EA setup for controller evolution us¬ 
ing the binary and GRN encodings, respectively. 

Direct Encoding When the EA was started, a population 
of bit-string genotypes were randomly generated. Each gen¬ 
eration, each genotype was systematically selected, decoded 
into an ANN controller (Binary Encoding section) and tested 
in the way-point navigation task (Simulation and Task Envi¬ 
ronment Section) for one robot lifetime (table 1), after which 
fitness was assigned to the tested controller (genotype). One 
generation was when all genotypes had been tested and eval¬ 
uated. Selection and mutation operators were then applied 
(table 4). 

In preliminary mutation operator testing it was found that 
using a constant mutation rate for each bit (gene) in each 
genotype resulted in a significantly lower task performance 
compared to controller evolution with GRN encoding. That 
is, at a high mutation rate, genotypes with high fitness were 
quickly found, however, convergence was sub-optimal. For 
relatively low mutation rates, the population converged to a 
set of fit genotypes, however genotypes with optimal fitness 
were not found. To address this, we executed 100 evolu¬ 
tionary runs of controller evolution with GRN encoding and 
recorded the Hamming distance between parent and child 
genotypes (Gene Regulatory Network Encoding section). 
The probability of each Hamming distance (number of bit- 
flips) occurring was then calculated and these probabilities 
were used to determine the number of bit-flips in a mutation 
of the binary direct encoding. That is, if on average, the as- 
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Parameter 

Value 

Parameter 

Value Range 

Robot speed 

5 units per iteration 

GRN Weights (wij) 

[-2.0,2.0] 

Robot maximum angular 

0.5 radians per iteration 

ANN Weights 

[-1.0,1.0] 

velocity 


ANN sensory and out- 

[-1.0,1.0] 

Robot heading 

[0, 2tt) 

puts 


Sensor range 

75 units 

Thresholds (6if) 

[-3,3] 

Collision radius 

20 units 

Number of nodes 

60 

Environment size 

400 x 200 units 

Incoming / outgoing 

[0,59] 

Way-point radius (navi¬ 

20 units 

connections per node 


gation task) 


Simulation iterations 

20 

Number of way-points 

[0,59] 

(maximum t) 


Simulation iterations 

250 



(robot lifetime) 





Table 1: Simulation and task parameter values. Table 2: Parameters for the Binary and Gene Regulatory 



Network controller encoding. 

Operator 

Description 

mu t at e _we i ght 

A new value for the weight of an edge is chosen from the 
allowed values. 

mutate_act ivat ion 

The value of the initial activation of a node is flipped. 

mutate_threshold 

A new value for the threshold of a node is chosen from 
the allowed values. 

add_edge 

A new incoming edge is added to the node. It connects to 
a random node and has a random weight. 

delete_edge 

One of the node’s edges is chosen at random and re¬ 
moved. 


Table 3: Mutation operators for the Gene Regulatory Network. 


sociated bit strings of GRNs had a Hamming distance of h 
from their parent genotypes (with probability p) 9 then when 
a binary encoded genotype was mutated, h bits would be 
flipped with probability p. This probability distribution was 
then assigned as the mutation rate for the direct encoding ap¬ 
proach. Table 5 presents the probability of a given number of 
bit-flips occurring whenever the binary encoding mutation 
operator was applied. It was found, when these probabilities 
were used (table 5), that the task performance of controller 
evolution with direct binary encoding was comparable to the 
task performance to early generations of controller evolution 
using the GRN encoding (Results section). 

Gene Regulatory Networking Encoding When the EA 
was started, GRNs were randomly initialized with given pa¬ 
rameter constraints, and each GRN simulated for a given 
number of iterations (table 2). If the GRN settled on a 
point attractor, then the GRNs activations were mapped to a 
bit-string, which was then decoded into an ANN controller 
(Gene Regulatory Network Encoding section). Each de¬ 
coded ANN controller was then simulated in the way-point 
navigation task (Simulation and Task Environment Section) 
for one robot lifetime (table 1), and fitness assigned to the 
tested genotype. One generation was when all genotypes 


had been tested and evaluated. Selection and mutation op¬ 
erators were then applied. The mutation operators spec¬ 
ified in table 3 were applied to every node of the child 
GRNs with the probability specified in table 4. However, 
the mutate_weight operator was applied to every con¬ 
nection of the GRNs with a lower probability (table 4). 

Experiments, Results, Discussion 

Controller evolution was run for 10000 generations in the 
way-point navigation task, where robot controllers were en¬ 
coded using direct binary or indirect GRN encoding. In 
order to investigate the conditions facilitating evolvability 
in this evolutionary robotics case study, task variants were 
switched every 200 generations. Hence, given these two 
controller encodings, two different evolutionary setups were 
run, where each setup was run 100 times to ensure viability 
of statistical tests on results data. 

Figure 2 presents task performance (average and best fit¬ 
ness) results for the way-point navigation task. Table 6 
presents average fitness results and table 7 presents statis¬ 
tical test results for within and between comparisons of di¬ 
rect and GRN encoded populations. Table 7 presents sta¬ 
tistical test results from pair-wise comparisons on average 
maximum and average fitness results in table 6. The Mann- 
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Parameter 

Value 

Bit flips 

Probability 

Population size 

1000 

0/13 

0.5088 / 0.0023 

Genotypes replaced per generation 

100 

1/14 

0.2466/0.0017 

Tournament size 

4 

2/15 

0.1068/0.0012 

Recombination 

None 

3/16 

0.0471/0.0009 

Generations per task variant switch 

200 

4/17 

0.0231/0.0006 

Number of generations 

10000 

5/18 

0.0139/0.0004 

Binary encoding mutation 

Bit-flip 

6/19 

0.0103/0.0003 

Binary encoding mutation rate 

See table 5 

7/20 

0.0087 / 0.0002 

GRN mutation 

See table 3 

8/21 

0.0076/0.0001 

GRN weight mutation rate 

0.002 

9/22 

0.0065/0.0001 

GRN node mutation rate 

0.02 

10/23 

0.0053 / 0.0000 

Genotype bit-string length 

60 

11/24 

0.0042 / 0.0000 



12/25 

0.0032 / 0.0000 


Table 4: Evolutionary Algorithm Parameters 


Whitney U test (p < 0.01) with Bonferroni correction (Flan¬ 
nery et al., 1986) for multiple comparisons was applied to 
gauge statistical significance. 

Table 6 presents average and maximum fitness results. 
These fitness results are presented for early and late stages 
of the evolutionary process. Early stages were at generation 
25 and 425 and late stages were at generations 9225 and 
9625. These generation intervals were chosen as they were 
an eighth into the allotted generations for task variant one, 
and were deemed a good measure of the population’s early 
response to the environmental change early and late in the 
artificial evolution process. 

One may note that we measure average and maximum fit¬ 
ness, rather than the rate of fitness change (relative fitness) as 
an indicator of a population’s responsiveness. We elected to 
use absolute fitness, since measuring relative fitness would 
unfairly benefit genotypes whose fitness dropped the most 
after a change in the task environment. That is, given two 
populations, the one with the highest fitness a given interval 
after a task change is concluded to be the most responsive. 
Also, this interpretation holds in the case that the other popu¬ 
lation suffers a greater fitness decrease after the task change 
which might subsequently lead it to having a faster fitness 
increase. 

In figure 2 one may observe that populations using the 
GRN encoding were able to significantly improve (with sta¬ 
tistical significance, table 7) their responsiveness between 
early and late stages of the evolutionary process. Here we 
use responsiveness and evolvability interchangeably to mean 
the speed with which a population adapts given task environ¬ 
ment changes (task variants). The responsiveness of popula¬ 
tions using the direct encoding was statistically comparable 


Table 5: Mutation rates for direct binary encoding. Prob¬ 
abilities match mutation probabilities of the binary GRN 
encoding (for given number of bit flips). Genotypes were 
sixty gene bit-strings, however, mutations of more than 
twenty-three bit flips never occurred 

during both the early and late stages of the evolutionary pro¬ 
cess (table 7). 

Hence, statistical tests run between average and average 
maximum fitness values of populations using the GRN en¬ 
coding in the early and late stages of evolution (table 7) 
confirm that in the late stages of evolution, populations re¬ 
spond more quickly to environmental change (task variants). 
Results indicate that populations using the GRN encoding 
were significantly more responsive in the late generations 
compared to populations using the direct encoding (table 7). 
Also, there was not a significant difference in the responsive¬ 
ness of populations using direct encoding between the early 
and late stages of evolution. This is supported by previous 
work (Crombach and Hogeweg, 2008; Draghi and Wagner, 
2008) and supports this study’s hypothesis that direct geno¬ 
type encoding does not facilitate responsiveness in changing 
environments (Introduction section). Although, in the early 
stages of evolution, the average fitness of populations using 
the direct encoding was more responsive than populations 
using the GRN encoding (figure 2). 

In terms of the responsiveness of populations using the 
direct encoding, there was no significant difference between 
the average and average maximum fitness values of these 
populations between early and late stages of evolution (ta¬ 
ble 7). This indicates that direct binary encoded populations 
did not evolve a responsiveness to the changing task envi¬ 
ronment of this way-point navigation task, as was observed 
in the case of GRN encoded populations (table 7). However, 
in terms of the average fitness, the direct encoding was sig¬ 
nificantly more responsive in the early stages of evolution. 

Thus, results demonstrate that a GRN encoding of popula¬ 
tions of robot controllers (behaviors) evolve to become more 
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Figure 2: Average and maximum fitness for evolved GRN (top left, right) and binary encoded (bottom left, right) controllers in 
the way-point navigation task. 


responsive to changes in their task environment. That is, a 
statistically significant difference (for average and maximum 
fitness values) was observed in the improvement of respon¬ 
siveness of GRN versus directly binary encoded populations 
in the way-point navigation task. The contribution is that 
results support previous work on the efficacy of GRN en¬ 
codings for conferring evolvability in changing task environ¬ 
ments (Crombach and Hogeweg, 2008), as well as extending 
previous work (Crombach and Hogeweg, 2008; Draghi and 
Wagner, 2008) into an evolutionary robotics task environ¬ 
ment with both time and space dynamics. 

A goal of this study was to investigate the environmen¬ 
tal and evolutionary conditions that facilitate the evolution 
of evolvability. Previous researchers have demonstrated 
that many-to-one genotype-to-phenotype mapping (redun¬ 
dant mappings) result in evolvability in EAs (Shipman, 
1999; Ebner et al., 2001a) as well as increased EA task 
performance. That is, a highly redundant mapping enables 
some mutations to have negligible impact on the fittest phe¬ 
notypes, meaning the EA is better able explore the search 
space via neutral networks (Ebner et al., 2001b). 

Redundancy, and the closely related notion of robustness 
(Wagner, 2005), is theorized to have played a key role in 
the increased responsiveness of evolved GRN encoded be¬ 
haviors, given that the GRN encodings are more redundant 
than the direct encodings. During their evaluation, GRN’s 
are decoded into bit-strings before these bit-strings are de¬ 


coded into ANN’s, which are then evaluated by the EA. 
Given that the decoding from GRN’s into bit-strings is a 
many-to-one mapping, the redundancy in the GRN encoded 
search space is considerably higher than that in the directly 
encoded search space. 

That is, the bit-strings are sixty characters long, which 
implies that there are 2 60 « 10 18 genotypes in this space. 
In the GRNs, each node has fifty-nine possible connections 
and each connection can either connect to one of the sixty 
nodes or not connect. This implies that each node has 59 61 
possible configurations. Note that although many of these 
configurations are equivalent, where the only difference is 
the ordering of the connections, each one forms a distinct en¬ 
coding and thus represents part of a distinct genotype. There 
are sixty nodes, so the number of genotypes in this space is 
(59 61 ) 60 w lO 6000 . 

Also, consider that in this study’s way-point navigation 
task, there were only nine fitness values, where fitness was 
equated with how many of eight way-points a robot passed 
in its lifetime. The ninth fitness value was to account for 
the robot not passing any way-points (Simulation and Task 
Environment Section). Thus, given the GRN encoding, nine 
possible phenotypes (way-point navigating behaviors) were 
represented by a high dimension and highly redundant geno¬ 
type space. That is, in this task, there were many possible 
genotype to phenotype (controller) mappings, where con¬ 
troller behavior was equated with one of nine fitness values. 
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Measure 

Fitness 

Measure 

Fitness 

Early maximum GRN encoding 

0.88 (0.18) 

Early average GRN encoding 

0.33 (0.11) 

Late maximum GRN encoding 

0.96 (0.11) 

Late average GRN encoding 

0.45 (0.16) 

Early maximum direct encoding 

0.86 (0.17) 

Early average direct encoding 

0.39 (0.15) 

Late maximum direct encoding 

0.85 (0.17) 

Late average direct encoding 

0.39 (0.16) 


Table 6: Maximum (left) and average (right) fitness and standard deviations (in parentheses) for way-point navigation for early 
and late stages of evolution. Results have been normalized, where given values are portions of the minimum and maximum pos¬ 
sible task performance: 0 and 1.0, respectively. Early stages were at generation 25 and 425 and late stages were at generations 
9225 and 9625. 
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X 
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• 

• 

• 

• 

• 

/ 

X 
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Table 7: Statistical test results from pair-wise comparisons on average fitness results in table 6. / signifies a statistically 
significant difference between two data-sets (p < 0.01) using the Mann-Whitney U test and Bonferroni correction. X signifies 
that the difference between two data-sets is not significant and • signifies that a test was not done. EM is an abbreviation for 
Early Maximum (average maximum fitness in early evolution), LM is Late Maximum , EA is Early Average (average fitness in 
early evolution), LA is Late Average , and DE is Direct Encoding. 


It is theorized that the larger size of the space of the GRN 
encoding is the cause of it exhibiting lower evolvability in 
the early stages of evolution. That is, given that phenotypes 
may not be uniformly distributed over the space (Pigliucci, 
2010; Parter et al., 2008), finding target phenotypes after ini¬ 
tialization may be more challenging. During the course of 
evolution, however, the population can move to areas of the 
space biased towards the targets. Moreover, other work has 
shown that direct encodings do exhibit a baseline of evolv¬ 
ability that is comparable to certain generative encodings 
(Tarapore and Mouret, 2015). 


This study’s results are also supported by related work 
(Ciliberti et al., 2007), that similarly modeled GRNs, where 
GRN instances were individual genotypes decoded into ex¬ 
pression patterns (phenotypes). Ciliberti et al. (2007) dis¬ 
covered that such a GRN encoding was robust (and redun¬ 
dant) as a large number of genotypic changes had no pheno¬ 
typic impact. 


To demonstrate this for our experimental results, current 
research is investigating the relationship between robust¬ 
ness, redundancy and evolvability for the GRN versus di¬ 
rectly encoded search spaces. This is being done for way- 
point navigation and more complex evolutionary robotics 
tasks. 


Conclusion 


This research presented an evolutionary robotics study that 
replicated and extended previous work testing the evolvabil¬ 
ity of populations of Gene Regulatory Networks (GRNs). 
Evolvability was defined as a population’s speed of adapta¬ 
tion to changing task environments. Direct binary encodings 
of robot controllers were compared to indirect GRN encod¬ 
ings in controller evolution to accomplish a way-point nav¬ 
igation task. Task variants were alternated during controller 
evolution to confirm previous results that GRNs facilitate 
the emergence of evolvability in environments with alternat¬ 
ing tasks. Results indicated that, for the GRN encoding ap¬ 
proach, populations became significantly more adapted to 
task variation over time, and thus evolvable. This was com¬ 
pared to a direct encoding of controllers which was unable 
to achieve a high level of evolvability in the same task en¬ 
vironment. This work thus demonstrates that the previous 
results are valid in a substantially more complicated domain 
and suggests approaches for aiding robots in dealing with 
dynamic environments. The findings were theorized to be a 
result of increased redundancy and robustness of the indirect 
GRN encoding of the search space. However, definitively 
demonstrating increased robustness and redundancy result¬ 
ing in increased evolvability for the GRN encoding of this 
and other more complex evolutionary robotics tasks remains 
the subject of ongoing research. 
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Abstract 

Genetic spaces are often described in terms of fitness land¬ 
scapes or genotype-to-phenotype maps, where each potential 
genetic sequence is associated with a set of properties and 
connected to other genotypes that are a single mutation away. 
The positions close to a genotype make up its ’’mutational 
landscape” and, in aggregate, determine the short-term evo¬ 
lutionary potential of a population. Populations with wider 
ranges of phenotypes in their mutational neighborhood tend 
to be more evolvable. Likewise, those with fewer phenotypic 
changes available in their local neighborhoods are more mu- 
tationally robust. As such, forces that alter the distribution of 
phenotypes available by mutation can have a profound effect 
on subsequent evolutionary dynamics. 

We demonstrate that cyclically-changing environments can 
push populations toward more evolvable mutational land¬ 
scapes where a wide range of alternate phenotypes are avail¬ 
able, though purely deleterious mutations remain suppressed. 
We further show that populations in environments with dras¬ 
tic changes shift phenotypes more readily than those in en¬ 
vironments with more benign changes. We trace this effect 
to repeated population bottlenecks in the harsh environments, 
which result in shorter coalescence times and keep popula¬ 
tions in regions of the mutational landscape where the pheno¬ 
typic shifts in question are more likely to occur. 

Introduction 

Fitness landscapes are a mathematical tool to map genetic 
sequences to expected evolutionary fitness. Many studies 
have examined the important role that different types of 
fitness landscapes play on evolutionary dynamics and out¬ 
comes, both in biological populations (Khan et al., 2011; 
Szendro et al., 2013; Weinreich et al., 2006; Nahum et al., 
2015) and in evolutionary computation settings (Merz and 
Freisleben, 2000; Humeau et al., 2013; Kallel et al., 2013). 
However, real-world fitness landscapes are far more com¬ 
plex and varied than the idealized models that are used in 
most of these studies. Neighboring regions of real land¬ 
scapes can have starkly different properties from each other 
based on the effects of and interactions among mutations 
(i.e., the mutational landscape). Examples of the type of 
properties that we are interested in include robustness, epis- 
tasis, and modularity, all of which are measurements of how 


information is organized inside of a genome and commonly 
categorized as components of an organism’s “genetic archi¬ 
tecture”. Isolated pockets in a landscape can often be char¬ 
acteristically different from the landscape as a whole due 
to the amount and organization of genetic information. In 
fact, in most natural fitness landscapes, the vast majority of 
neighborhoods consist entirely of non-replicating genomes 
with zero fitness (and thus no genetic information), making 
life itself appear to be a rare exception (Gavrilets, 2004). 

Evolution on these convoluted landscapes is clearly lim¬ 
ited to those regions that have non-zero fitness, with a selec¬ 
tive pressure for fitness to increase. Beyond that, however, 
populations can evolve toward neighborhoods with specific 
local properties based on the evolutionary forces acting upon 
the populations. For example, high mutation rates drive pop¬ 
ulations toward neighborhoods with a higher fraction of neu¬ 
tral mutations in the effect dubbed survival of the flattest 
(Wilke et al., 2001). Similarly, sexual populations tend to¬ 
ward regions of the fitness landscape with more modularity 
(Misevic et al., 2006) and more negative epistasis (Misevic 
et al., 2010) than otherwise equivalent asexual populations. 

Understanding these dynamics is of broad interest. It is 
important to evolutionary computation given the strong in¬ 
fluence of local landscape properties on the quality of the 
final solutions that an evolving population is able to obtain. 
Its relevance to evolutionary biology is equally obvious - 
the local landscape that a population occupies will influence 
the selective forces at play in the population, creating a feed¬ 
back cycle between these two important evolutionary factors 
(Zaman et al., 2014; Meyer et al., 2012). Disentangling such 
interactions is likely to provide further insights into funda¬ 
mental evolutionary dynamics. Computational artificial life 
systems have the advantage of being able to bridge these 
two realms: they have unconstrained evolutionary dynamics 
similar to natural systems, while maintaining the ability to 
rapidly perform experiments and collect any data we need 
about populations or their local landscapes. 
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Evolvability and Genetic Architecture 

Evolvability refers to a series of distinct but overlapping 
concepts that are generally concerned with adaptation, vari¬ 
ation, and/or novelty generation (Pigliucci, 2008). For the 
purposes this paper, we will focus on evolvability as the 
capability of genomes to generate adaptive variation in re¬ 
sponse to mutation. This kind of evolvability depends pri¬ 
marily on the organization and interrelation of information 
in the genome; that is, the genetic architecture, and the re¬ 
sulting genotype-to-phenotype map (Gunter P. Wagner and 
Altenberg, 1996). An example of evolvable architecture can 
be found in some bacterial genomes that contain highly mu¬ 
table genome regions, called contingency loci. Small sets 
of insertions or deletions to these regions create transcrip¬ 
tion frameshifts that alter the expression of nearby coding 
regions, thus allowing populations to easily switch pheno¬ 
types via minor mutations. Contingency loci are most often 
seen in the genomes of pathogens, which are subject to fre¬ 
quent environmental shifts caused by the host immune sys¬ 
tem (Bayliss et al., 2001). Thus, these populations are able 
to produce large amounts of heritable variation despite the 
reduction in population diversity resulting from population 
bottlenecks. 

Mutational Landscapes Properties of genetic architec¬ 
tures such as evolvability and robustness are determined by 
the shape of the resulting mutational landscape (Andreas 
Wagner, 2008). Robust genetic architectures that can toler¬ 
ate more mutations without altering their phenotype reside 
in mutational landscapes that connect to more neutral mu¬ 
tants. Similarly, architectures that more easily switch phe¬ 
notypes in response to mutation without substantial reduc¬ 
tion in fitness, reside in more evolvable regions of genotype- 
space. 

It is worth noting that not all regions of the mutational 
landscape are equally accessible. Some genome regions 
may be more resistant to mutation than others (Lee et al., 
2012), thereby altering the probabilities of mutations occur¬ 
ring that lead into certain regions of the mutational land¬ 
scape. This kind of differential probability may therefore 
moderate a population’s diffusion through the mutational 
landscape. Further, in regions of the landscape where there 
are fewer available mutations that provide potentially adap¬ 
tive traits, response to selection is likely to be weaker than 
in regions where there are many adaptive variants available 
within a few mutational steps (Alberch, 1991; Carter et al., 
2005). 

Changing environments create more paths to 
different kinds of phenotypes 

Directional selection acts to change the composition of phe¬ 
notypes and genotypes in a population (Wright, 1931). This 
change moves the population across the mutational land¬ 
scape to local regions of higher fitness. As populations ar¬ 


rive at a fitness peak, they tend to cluster there, and the ac¬ 
cumulation of new phenotype-altering mutations decreases 
(Wright, 1964; Kauffman and Levin, 1987). In changing en¬ 
vironments, however, the direction of selection is not fixed. 
Instead, as the environment changes, populations are driven 
to explore new regions of the mutational landscape (Kash- 
tan et al., 2007; Connelly et al., 2015). As they proceed, 
populations accumulate and carry with them the history of 
prior explorations and adaptations, and use them as raw ma¬ 
terial for new adaptation (McClintock, 1993). Indeed, ear¬ 
lier work has shown that changing environments promote 
evolvability in many contexts, without compromising ro- 
bustness(Crombach and Hogeweg, 2008; Wilke et al., 2001). 
Strength of selection is also an important component of this 
exploration, since the harshness of the environment drives 
the speed with which organisms adapt to new conditions 
(Goddard et al., 2005). 

In this paper, we show how changing environments not 
only drive exploration of the mutational landscape, but also 
create populations whose genetic architectures are qualita¬ 
tively different than those from populations evolved in static 
environmental conditions. In particular, we show that pop¬ 
ulations evolved under harsh cyclically-changing environ¬ 
ments have many more changes along their phylogenetic 
histories than those evolved in static or benign changing en¬ 
vironments. They also individually contain large reservoirs 
of pseudogene-like vestigial loci that were acquired and de¬ 
activated through repeated adaptation and fixation cycles. 
As a result, populations evolved in these harsh cyclically- 
changing environments are low in standing neutral diversity 
at the population level, but they still connect with many more 
phenotypically-interesting regions of the mutational land¬ 
scape than more diverse populations evolved in static or be¬ 
nign environments. 

Digital Evolution 

Digital Evolution is a sub-field of Artificial Life that focuses 
on studying evolutionary dynamics using self-replicating 
computer programs as model organisms (McKinley et al., 
2008). Unlike theoretical simulations, digital organisms 
self-replicate, mutate, and compete with their peers for re¬ 
sources and space in which to reproduce. Because popula¬ 
tions of digital organisms have a source of variation, inheri¬ 
tance of genetic material across generations, and are subject 
to selective pressures, they undergo evolution by natural se¬ 
lection. 

Digital organisms do not suffer from many of the draw¬ 
backs of experimentation on natural organisms. Three of 
the advantages of digital organisms are particularly rele¬ 
vant for our study. First, the rates of reproduction in digi¬ 
tal systems are much faster than in even the most rapidly- 
reproducing physical organisms; we can get generations 
in seconds, rather than the hours for the fastest biological 
organisms (Ryan, 1953; Lenski et al., 1991), or the days 


269 



or even weeks needed for complex multicellular organisms 
(Anderson et al., 2010; Steams et al., 2000).Second, using 
digital organisms allows us to tightly control and verify ex¬ 
perimental conditions. For example, in physical organisms, 
factors such as mutation rate can generally only be mea¬ 
sured after the fact, or coarsely altered through mutagens. 
In digital organisms, however, we can not only control mu¬ 
tation rates with fine-grained precision, but also types and 
probabilities of different types mutations (e.g., point mu¬ 
tations, insertions, deletions). Further, we are also able to 
track and replay the evolutionary history of every organism 
at any point in time to verify that unusual or unexpected re¬ 
sults do not represent measurement error. This ability to ex¬ 
actly replicate evolutionary results at an individual organism 
level is firmly out of reach for experiments with physical or¬ 
ganisms. Finally, we can precisely and exhaustively map 
the mutational landscape of a digital organism, and identify 
the role of every site in its genome(Ofria and Adami, 2002); 
this is not feasible in even the simplest physical organisms. 
All of these factors make digital organisms ideal for study¬ 
ing the effects of changing environments on the mutational 
landscape. 

Methods 

Avida Digital Evolution Platform 

We used Avida (Lenski et al., 2003) to examine the effects 
of cyclic changing environments on the genomes of evolved 
digital organisms. Avida is a software platform for perform¬ 
ing evolution experiments with digital organisms in a virtual 
world. 



Figure 1: An example virtual CPU from Avida, with a 
circular genome (blue), three registers (purple), input and 
output handlers (tan), and an instruction pointer (yellow) in¬ 
dicating the next instruction to be executed. 

An Avida organism is composed of a circular genome of 
assembly-like computer instructions that are executed in a 
virtual CPU (Figure 1). Populations of these organisms are 
placed in a toroidal world in individual cells where they are 
allowed to execute, reproduce, compete for space, mutate, 
and evolve. 


Organisms in Avida are self-replicating, and experience 
mutation. The genome in the initial default organism con¬ 
tains all of the instructions necessary for reproduction. How¬ 
ever, the instructions are not copied into an offspring with 
perfect fidelity. By default, the reproductive copy instruc¬ 
tion is faulty, meaning that it will probabilistically introduce 
errors (mutations) into the offspring genomes. These off¬ 
spring organisms carry and execute the mutations to their 
genomes, and in turn pass them on, along with new muta¬ 
tions, to their own offspring (i.e., variation in the systems is 
heritable). 

Avida worlds can be space- or resource-constrained. 
Avida allows the experimenter to configure many aspects 
of the environment, thus subjecting the organisms to vari¬ 
ous kinds of selective pressures. In many cases, these en¬ 
vironments will include resources that can be metabolized 
by performing specific functions or activities, resulting in 
a boost to execution speed that gives the organisms a com¬ 
petitive advantage. However, even without explicit external 
pressures, organisms still experience an implicit pressure to 
execute more quickly and efficiently. The organisms that run 
fastest are typically able to also reproduce fastest, and thus 
outcompete their peers for space. 

Thus, because populations have a source of variation, in¬ 
heritance, and experience selection, evolution by natural se¬ 
lection is an inevitable consequence. Further, because the 
Avida genome instruction set is Turing-complete 1 , popula¬ 
tions may evolve potentially infinite complexity of behav- 
ior(Ofria et al., 2002). 

Avida is available for download without cost from 
http://avida.devosoft.org/, and specific ver¬ 
sions along with data-files to reproduce the experiments de¬ 
scribed in this paper may be found at https : / / github . 

com/voidptr/avida and https://github.com/ 
voidpt r/alif e2 016. 

Experimental Design 

We subjected a total of 150 replicate populations of digi¬ 
tal organisms to two different treatments of two-phase cycli¬ 
cal changing environments, plus a static control. The en¬ 
vironment cycles between equal-length periods of reward 
and punishment. Each cycle extends for 1000 updates, or 
roughly 30 generations. In the static control, there is no 
cycle. Rather, the rewards remain constant. The complete 
experiment extends for 200 cycles, or 200,000 updates, ap¬ 
proximately 6,000 generations. 

We structured the environment to provide large rewards to 
organisms for performing two challenging bit-wise logical 
tasks: XOR and EQU. XOR is rewarded with a CPU speed 
(and thus fitness) multiple of 8, while EQU is rewarded with 
a CPU speed multiple of 32. In the harsh treatment, as the 
cycle progresses, the XOR reward remains constant, while 

! The Avida instruction set is a super-set of the Tierra instruction 
set, which has been shown to be Turing-complete(Maley, 1994). 
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the EQU reward cycles between a 32-fold bonus and a cor¬ 
respondingly harsh 32-fold penalty (i.e., CPU speed is di¬ 
vided by 32 when EQU is performed in the off cycle). The 
benign treatment is identical to the harsh treatment, except 
that the reward merely goes away in the off-cycle as opposed 
to incurring a severe penalty. 

We identify EQU as the Fluctuating Task. XOR, because 
it is rewarded continuously, is the Backbone Task , and is 
used as a background for comparing the separation or inter¬ 
twining of functional genetic components in the evolution of 
EQU. Further, the 4-fold difference in reward level between 
XOR and EQU encourages the evolution and maintenance 
of EQU when possible. 

For all of the experiments described in this section, we 
held the individual genomes at a fixed length of 121 2 in¬ 
structions instructions, but mutated the new genome after 
each successful replication event at a substitution probabil¬ 
ity of 0.00075 per site. We configured the Avida world to 
have local interactions on a toroidal grid that is 60 cells 
by 60 cells (3600 cells in total), and we seeded the initial 
populations with an ancestor that was previously evolved to 
perform XOR and EQU under a static reward. The genetic 
architecture for performing XOR and EQU is tightly inter¬ 
twined in this ancestral organism, as it was evolved with no 
selective pressure for modularity. 

Results and Discussion 

Our experiments demonstrate that digital organisms that 
were evolved in cyclic changing environments differ sub¬ 
stantially from those evolved in static environments in a 
number of ways. These differences include the number of 
mutations that fix in the lineage from the ancestor (the “phy¬ 
logenetic depth”), key metrics of their genetic architecture, 
and the presence of reservoirs of pseudogenes that change 
the nearby mutational landscape. 

Evolutionary History and Population Structure Evolu¬ 
tion in the harsh changing environment resulted in popula¬ 
tions with substantially higher phylogenetic depth as com¬ 
pared to those evolved in static or benign environments. At 
each environmental shift, adaptive mutations rapidly swept 
and fixed in the populations. (Figure 2) 

The populations evolved in the control and benign envi¬ 
ronments displayed much more genetic diversity compared 
to those evolved in the harsh cyclic environment, which 
underwent what was effectively a bottleneck at each cycle 
shift. Because a selective sweep reduces current diversity 
within a population, the smaller number of sweeps in the 
benign and control treatments led populations in them to 

2 As part of the initial exploratory protocol, we hand-wrote an 
organism with separated sections that performed XOR and EQU. 
In order to compare the hand-written organism with a sample of 
evolved organisms, we matched their genome lengths, which were 
121 instructions. 
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Figure 2: Phylogenetic depth of representative popula¬ 
tions evolved in each of the three treatments. White hor¬ 
izontal lines mark the depth of the most recent common 
ancestor, and discontinuities in this line indicate that the 
most recent common ancestor has changed, and thus that 
a sweep occurred. The control treatments had a mean of 
18 sweeps (STD=9.05), the benign treatments had a mean 
of 21 (STD=19.05), and the harsh treatments had a mean 
of 88 sweeps (STD=23.37). Note the difference in scales 
between y-axes: the control-evolved population has a maxi¬ 
mum depth of 400 mutational steps from ancestor, while the 
harsh-evolved has upward of 1100. 


have higher standing diversity for most of their evolution¬ 
ary history than those populations from the harsh changing 
environment. Despite this higher standing diversity in the 
benign and control treatments, regions of low diversity are 
still evident in the genomes of these populations, implying 
purifying selection on the traits encoded at these sites (see 
Figure 3). 

Genetic Architecture The selective shifts in both benign 
and harsh changing environments result in qualitatively dif- 
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Population Entropy by Site and Genotype 
„ Control (Static) Environment 
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Figure 3: Population Per-site Entropy over time. Each 
vertical slice represents the per-site entropy of the popula¬ 
tion at each update, both by genetic locus (upper), and over¬ 
all population mean (lower). Hotter colors (upper) indicate 
greater diversity at this locus. Mean population entropy in¬ 
dicates the relative diversity of the population at any given 
time, while the per-site entropy shows where in the genomes 
the population diversity is located. 


ferent architectural styles from the static control environ¬ 
ment. The task arrangements evolved under both experimen¬ 
tal treatments are much more scattered than in the control. 
The bulk of the sites responsible for performing the fluc¬ 


tuating task (EQU) were separated from the backbone task 
(XOR), except for a core region of overlap, which represent 
portions of the tasks that are shared between XOR and EQU. 
(Figure 4) 
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Figure 4: Varying genetic architecture of XOR and EQU 
over time for the final dominant genotype in a randomly 
selected replicate. Proceeding from the left of each figure, 
each vertical slice along the X-axis represents an ancestor 
of the final dominant. The Y-axis represents the tasks coded 
for at each genome locus. Sites in red are active sites that 
code for the XOR task only, sites in blue are active sites for 
the EQU task only, and purple sites code for both XOR and 
EQU. Knockouts to the sites in black are lethal to the or¬ 
ganism. Sites in the lighter colors (tan, light blue, lavender) 
represent vestigial sites for XOR only, EQU only, or both 
tasks, respectively. As we proceed from left to right, we can 
see the evolutionary history of the final dominant genotype. 

In contrast, the architecture of XOR and EQU remain 
tightly intertwined in the control, and site positions do not 
change significantly over the course of the experiment. In 
the benign treatment, many more regions that perform the 
fluctuating task (XOR) are scattered throughout the genome, 
but site positions remain relatively static throughout the run 
after an initial adaptive phase. In the harsh treatment, not 
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only are the active sites scattered, but the positions of active 
sites change and proliferate wildly. 

Interestingly, populations evolved in both the benign and 
harsh treatments also show development of a large reservoir 
of formerly functional, now vestigial, sites; that is, sites that 
were previously active in performing a task, but were dis¬ 
abled by a mutation elsewhere and are now neutral. How¬ 
ever, these vestigial pseudogene-like sites appear to be im¬ 
portant for allowing the organisms to quickly re-adapt as 
the fluctuations in the environment restore the previously- 
rewarded functions. (Figure 5) 


FLinriinnal Cm ini 



Treatment Environment 



Static (control) Benign Harsh 

Treatment Environment 

Figure 5: Number of functional and vestigial sites by 
treatment. The harsh environment has a significantly larger 
number of vestigial sites for the fluctuating (EQU) task com¬ 
pared to the benign treatment or control, while having a 
comparable number of functional sites (One-Way ANOVA 
F(2,132) = 54.35, p « 0.0001). 


Nearby mutational landscape In order to identify the 
role that these pseudogene-like structures play, we per¬ 
formed a survey of the single-step mutational landscape sur¬ 
rounding the last common ancestor of each replicate popu¬ 
lation. This landscape contained approximately 3,200 dis¬ 
tinct mutants in each of the 50 replicates per treatment, 
for a total of almost 500,000 mutants surveyed. We found 
that the availability of reservoirs of vestigial sites shifted 
the treatment-evolved organisms’ position in genotype space 
such that, compared to the control-evolved organisms, a task 
lost due to mutation more often remains one or two muta¬ 
tional steps away. In this way, the treatment organisms have 
an advantage over organisms from the control runs in terms 
of the short-term evolvability of the fluctuating task. (Fig¬ 
ures 6 and 7) 


Fraction of 1-Step Mutants 



Figure 6: A survey of the single-step mutational neigh¬ 
borhood around organisms that performed the fluctuating 
task. Note that in both the benign and harsh treatments, 
there were significantly more mutants that lost the EQU task 
as compared to the control (Wilcoxon Rank Sum Test: Z = 
-6.59 and -6.70 respectively, p << 0.0001). This result indi¬ 
cates that it was easier for the organisms in both treatments 
to turn off the EQU task in response to one mutation. 

Fraction of 2-Step Mutants That Regained EQU 



Treatment Environment 

Figure 7: A survey of the two-step mutational neighbor¬ 
hood of the organisms that lost EQU function in the one- 
step survey. We found that in both the harsh and benign 
treatments, there were significantly more organisms that re¬ 
gained function in response to mutation than the control. 
(Wilcoxon Rank Sum Test: Z = -6.11 and -7.38 respectively, 
p << 0.0001). This result indicates that it was easier for the 
organisms in both treatment environments to regain the task 
in response to one additional mutation. 


Conclusion 

In cyclic changing environments, the direction of selection 
shifts frequently, and periodically drives populations to not 
only explore new regions of the genetic landscape, but also 
to carry with them the genetic heritage of previous environ¬ 
mental adaptations. Thus, the resulting populations are not 
only adapted to the local temporarily static environment, but 
also to the meta-environment of cyclic change. Because of 
their mutational history, and the paths that led them to their 
current region of genotype space, the genomes contain vesti¬ 
gial fragments of genetic material that were adapted to prior 
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environments. As this exploration proceeds, more mutations 
accumulate in the population, and each of these creates a 
link to a new region of the mutational landscape. As these 
links accumulate, they form a reservoir of mobility for the 
population to quickly shift to new phenotypes as dictated by 
shifting selective forces. In this way, the accumulation of 
vestigial or pseudogene-like regions acts as an adaptation to 
the larger pattern of changing selective forces. 

By contrast, in static (non-changing) environments, the 
majority of neutral mutations do not connect to as many 
phenotypically-interesting regions of genotype-space. There 
are far fewer pseudogenes-like regions available that could 
regain functionality should conditions change. Thus, popu¬ 
lations evolved in static environments are less evolvable in 
the short-term. 

Limitations of Changing Environments and Future 
Directions 

Changing environments produce unique sets of selective 
pressures that promote more rapid exploration of genotype 
space, while also building useful reservoirs of partial func¬ 
tionality that may be co-opted in the evolution of more com¬ 
plex structures. These features make changing environments 
useful for both their explanatory power in natural evolu¬ 
tion, and as practical tools in the Artificial Life toolkit. Ul¬ 
timately, however, cyclic changing environments only re¬ 
tread existing phenotypic ground, and though genotypic ex¬ 
ploration is wider and faster than under purely directional or 
stabilizing selection, the space explored remains limited to 
the scope of the phenotypes that are being selected for. 

Even so, there must certainly exist other methods of ex¬ 
ploring genotype space that do not suffer from these limita¬ 
tions. For example, perhaps repeated bottlenecking of popu¬ 
lations could promote faster traversal of the fitness landscape 
in quasi-random directions. Another alternative may be ran¬ 
domly changing environments, rather than cyclic, which 
might produce a broader exploration of the wider genotype 
space. Finally, perhaps these kinds of environments could be 
coupled with dynamically increasing open-ended complex¬ 
ity goals. 

Understanding the mechanisms by which different types 
of environments alter fitness landscapes is vital to develop¬ 
ing an understanding of the forces that promote evolvabil- 
ity and increase complexity. Cyclic changing environments 
provide one view into these dynamics, but we must explore 
further to find other mechanisms for exploring and exploit¬ 
ing genotype space. 
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Abstract 

Robustness and evolvability have traditionally been seen as 
conflicting properties of evolutionary systems, due to the fact 
that selection requires heritable variation on which to oper¬ 
ate. Various recent studies have demonstrated that organ¬ 
isms evolving in environments fluctuating non-randomly be¬ 
come better at adapting to these fluctuations, that is, increase 
their evolvability. It has been suggested that this is due to 
the emergence of biases in the mutational neighborhoods of 
genotypes. This paper examines a potential consequence of 
these observations, that a large bias in certain areas of geno¬ 
type space will lead to increased robustness in corresponding 
phenotypes. The evolution of boolean networks, which bear 
similarity to models of gene regulatory networks, is simu¬ 
lated in environments which fluctuate between task targets. 
It was found that an increase in evolvability is concomitant 
with the emergence of highly robust genotypes, where evolv¬ 
ability was defined as the population’s adaptability. Analysis 
of the genotype space elucidated that evolution finds regions 
containing robust genotypes coding for one of the target phe¬ 
notypes, where these regions overlap or are situated in close 
proximity. Results indicate that genotype space topology im¬ 
pacts the relationship between robustness and evolvability, 
where the separation of robust regions coding for the various 
targets was detrimental to evolvability. 


Introduction 

An open question in artificial and natural life is whether dig¬ 
ital and natural organisms undergoing an evolutionary pro¬ 
cess are able to become better at adapting to the selective 
pressures presented to them, that is to become more evolv- 
able (Wagner and Altenberg, 1996a). However, this ques¬ 
tion is complicated by multiple definitions of evolvability. 
In both evolutionary biology (Pigliucci, 2008, 2010; Wag¬ 
ner, 2008; Parter et al., 2008) and Evolutionary Computation 
(EC) (Tarapore and Mouret, 2014), for example, evolvability 
refers to either populations or individuals (Wilder and Stan¬ 
ley, 2015). Similarly, within EC (Eiben and Smith, 2003) 
numerous definitions and associated metrics have been pro¬ 
posed. For example, those that focus exclusively on so¬ 
lution fitness (Grefenstette, 1999; Reisinger and Miikku- 


lainen, 2007) or variability of offspring (Lehman and Stan¬ 
ley, 2013). Tarapore and Mouret (2014) developed a met¬ 
ric which incorporated both the fitness and diversity of off¬ 
spring. Reisinger et al. (2005) measured evolvability as the 
ability of genotypes with various representations to detect 
invariant patterns as commonalities in a changing fitness 
function. 

Two significant definitions from the biological literature 1 
which are pertinent to this work are those of phenotypic ac¬ 
cessibility and adaptability. The former refers to the propor¬ 
tion of phenotypes that can be accessed through evolution 
(Wagner, 2008; Ciliberti et al., 2007a,b). This proportion is 
determined by the topology of the genotype space (that is, 
which other genotypes a given genotype can directly mutate 
to) and the G -A P mapping (which phenotype each genotype 
codes for). Adaptability refers to the ability of organism to 
adapt to changes in the environment. It has the advantage 
of not imposing an experimenter-biased measure on the sys¬ 
tem, but merely asks whether evolution delivers the goods 
(Budd, 2006; Pigliucci, 2008). A prevailing hypothesis is 
that if the environment sufficiently varies over time, then 
organisms evolve the ability to be able to evolve suitable 
adaptations to such environmental changes faster (Wagner 
and Altenberg, 1996a; Wagner, 2008; Draghi and Wagner, 
2008). Crombach and Hogeweg (2008) as well as Draghi 
and Wagner (2008) have demonstrated that computational 
models of Gene Regulatory Networks (GRNs) exhibit such 
evolvability. 

The representation problem in EC addresses the issue of 
how to represent and adapt (mutate and recombine) geno¬ 
types in order that a broad range of complex solutions can be 
represented by relatively simple genotype encodings (Wag¬ 
ner and Altenberg, 1996b). The choice of representation 
and associated operators has a significant impact on the evo¬ 
lution of viable solutions and representations which facili¬ 
tate evolution have been termed evolvable (Wagner and Al¬ 
tenberg, 1996b; Rothlauf, 2006; Simoes et al., 2014). Sim¬ 
ilarly, in nature, genetic information defining the form and 

'For a review of evolvability in biology, the reader is referred 
to Pigliucci (2008). 
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function of an organism is stored within its genotype, how¬ 
ever, the developmental process which translates this infor¬ 
mation into phenotypes (the G^P map) is not well un¬ 
derstood (Pigliucci, 2010). Yet, it has become clear that 
the G P map is neither one-to-one nor linear (Gjuvsland 
et al., 2013). In many organisms and Ribonucleic acid 
(RNA) folding (Draper, 1992), it has been found that many 
genotypes can code for a single phenotype and that genetic 
change resulting from mutation is not proportional to phe¬ 
notypic change (Pigliucci, 2010; Wagner, 2008; Parter et al., 
2008). 

These features of the G ^ P map have two important con¬ 
sequences on evolutionary dynamics. The first is that they 
allow for the emergence of robust genotypes. As many geno¬ 
types can code for one phenotype, it is possible that some 
number of a genotype’s mutational neighbors code for the 
same phenotype as it does, thus affording the genotype a de¬ 
gree of mutational robustness (Wagner, 2008). The second 
consequence is that it becomes plausible that certain phe¬ 
notypes are found in greater abundance in certain regions 
of the genotype space. This in turn results in genotypic 
mutations having non-random effects on phenotypes, open¬ 
ing up the possibility that the distribution of these effects is 
in some way advantageous (Hogeweg, 2012; Watson et al., 
2014; Parter et al., 2008; Meyers et al., 2005; Gerhart and 
Kirschner, 2007; Pavlicev et al., 2010). 

We hypothesize that both robustness and mutational bias 
facilitate evolvability, however, the complex relationship be¬ 
tween robustness and evolvability makes this non-trivial to 
elucidate. Recent work has indicated that a certain degree 
of robustness has a large benefit on evolvability interpreted 
either as phenotypic accessibility (Wagner, 2008; Ciliberti 
et al., 2007a) or adaptation to a goal (Draghi et al., 2010). 
These studies are predicated on the assumption that a cer¬ 
tain proportion of genotypes are non-viable and will never 
produce offspring. This constrains the portion of the geno¬ 
type space that is accessible from a given genotype. That 
is, once robustness rises above a certain threshold, geno¬ 
types of a given phenotype get connected in large neutral 
networks which can be traversed in order to access a large 
variety of phenotypes and this access to variation allows for 
faster adaptation towards a stationary task target. 

Mutational biases, however, can increase the likelihood 
of a given phenotype occurring in the neighborhood of a 
given genotype, thus facilitating adaptation towards it. Var¬ 
ious recent studies have demonstrated that organisms evolv¬ 
ing in environments fluctuating non-randomly are able to be¬ 
come better at adapting to these fluctuations, thus increasing 
their evolvability (Crombach and Hogeweg, 2008; Draghi 
and Wagner, 2008). It has been suggested that this is due 
to the emergence of biases in the mutational neighborhoods 
of genotypes (Hogeweg, 2012; Watson et al., 2014). 

Hence, this study aims to examine a potential conse¬ 
quence of these biases, which is that increasing the bias 


Parameter 

Value Range 

Weights ( Wij ) 

[-2,2] 

Thresholds (%) 

[-3,3] 

Number of nodes 

20 

Incoming connections per node 

[1,10] 

Simulation iterations (maximum t) 

6 


Table 1: Parameters for the networks composed of nodes 
with threshold functions 

Parameter 

Value Range 

Number of nodes 

20 

Incoming connections per node 

2 

Simulation iterations (maximum t) 

6 


Table 2: Parameters for the networks composed of nodes 
with threshold functions 

towards a small number of phenotypes within a region of 
the genotype space will increase the robustness of the geno¬ 
types within that region. Experiments were conducted using 
the simulated evolution of boolean networks in a fluctuat¬ 
ing environment. These boolean networks were comprised 
of nodes containing either NAND logic gates or threshold 
functions. The networks composed of threshold functions 
closely resemble models of gene regulatory networks used to 
demonstrate the emergence of evolvability in varying envi¬ 
ronments (Crombach and Hogeweg, 2008; Draghi and Wag¬ 
ner, 2008). The networks composed of NAND logic gates 
were similar to those used to demonstrate the emergence of 
modularity (Kashtan and Alon, 2005) as well as task perfor¬ 
mance speedup (Kashtan et al., 2007) in fluctuating environ¬ 
ments. 

Results indicate that an increase in evolvability is con¬ 
comitant with the emergence of highly robust genotypes. 
Analysis of the genotype space elucidated that evolution 
finds regions containing robust genotypes coding for one of 
the target phenotypes and that these regions are either over¬ 
lapping or situated in close proximity. It was further found 
that the greater the separation between robust regions, the 
greater the drop in fitness during target changes. This im¬ 
plies that less separated robust regions allows for greater 
evolvability to emerge. 

Methods 

Network Models 

Networks were formed of nodes which could hold an activa¬ 
tion value of either zero or one. Each node had an activation 
function which was either a threshold or a NAND function. 
The activations of other connecting nodes were used as the 
inputs to these functions, the specification of which nodes’ 
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Operator 

Description 

mu t at e _we i ght 

A new value for the weight of an edge is chosen 
from the allowed values. 

mutate.connection 

An incoming connection to a node is given a dif¬ 
ferent source. 

mutate.threshold 

A new value for the threshold of a node is chosen 
from the allowed values. 

add.edge 

A new incoming edge is added to the node. It con¬ 
nects to a random node and has a random weight. 

delete.edge 

One of the node’s edges is chosen at random and 
removed. 


Table 3: Mutation operators for the networks. 


Parameter 

Value 

Population size 

5000 

Births per generation 

5000 

Tournament size 

10 

Recombination 

None 

Generations per task variant switch 

10 

Number of generations 

500 

Mutation 

See table 3 

NAND connection mutation rate 

0.05 

Threshold connection mutation rate 

0.005 

Node mutation rate 

0.02 


Setup 

Silhouette score 

NAND, pair one 

0.27 (0.08) 

NAND, pair two 

0.08 (0.05) 

Threshold, pair one 

0.06 (0.03) 

Threshold, pair two 

0.06 (0.05) 


Table 6: Silhouette scores for generated genotypes in the 
neighborhood of the best genotype at then end of multiple 
runs for each setup. Standard deviations are in parentheses. 


Table 4: Evolutionary Algorithm Parameters 

Target One Target Two 

Pair One 0000011001100000 0110111111110110 

Pair Two 0000011001100000 0000111001100100 


Table 5: Task target pairs used. The bit strings represent the 
desired outputs for each input permutation, where the per¬ 
mutations are ordered by their integer interpretation. Thus, 
for instance, the first bit of the string represents the desired 
output for the permutation 0000, the last bit for permutation 
1111 and the fourth bit for permutation 0100. 


presents the parameters of the threshold networks. 

Mutation Operators 

Table 3 specifies the mutation operators used in the evo¬ 
lutionary algorithm applied to evolve the networks. The 
mutate .weight operators were applied to the GRN 
node connections, where for every connection, the oper¬ 
ator was applied with a given probability (table 4). All 
other mutation operators acted on nodes’ activation and 
threshold functions with a given probability (table 4). In 
the evolution of networks with NAND functions only the 
mutate .connect ion operator was used, whereas all 
the operators were used in the evolution of networks with 
threshold functions. 


activations to use thus implied the connectivity of the net¬ 
work. Updates were done synchronously. The threshold 
function used is specified in equation 1. 



Y.j WjiSq (t) < Qi 


(l) 


Where, Si(t) is the activation of the ith node at simulation 
iteration t , wij is the connection weight of the directed edge 
from the it h to the jth node, and 6i is the threshold of the 
it h node. If no such connection exists then = 0. Table 2 
presents the parameters of the NAND networks and table 1 


Evaluation 

Networks were evolved to perform specific boolean func¬ 
tions with four inputs and one output. Each network was 
therefore run sixteen times, once for each of the possible in¬ 
put permutations. For each run, nodes, designated as the 
input nodes , had their activations set to the values of the 
given input permutation and their activations were clamped 
to these values for the duration of the run. The network was 
then simulated for the number of time steps specified in ta¬ 
bles 1 and 2. 

At the end of the simulation, the value of a designated 
output node was read and if it matched the output value of 
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the target function, for the given input, the fitness of the net¬ 
work was incremented by one. Networks could therefore 
have fitness values in the range [0,16]. 

Each evolutionary run had a pair of target functions. The 
target against which networks were evaluated alternated be¬ 
tween the members of this pair. The target pairs are speci¬ 
fied in table 5. Target one of pair one is the function (a XOR 
b) AND (c XOR d) and target two of pair one is (a XOR 
b) OR (c XOR d). This was done to maintain consistency 
with previous work on evolving boolean networks in fluctu¬ 
ating environments (Kashtan and Alon, 2005; Kashtan et al., 
2007). 

Target one of pair two is the same as in pair one, however 
the second target was chosen to differ randomly in two out¬ 
puts, as can be seen in table 5. This choice was made so as 
to ascertain the effect of alternating between targets that are 
more similar. Moreover, these various experiment param¬ 
eters were chosen because preliminary testing showed that 
they were conducive to the emergence of evolvability. 

Evolutionary Algorithm 

The networks were evolved with an Evolutionary Algorithm 
(EA) using tournament selection and elitist survivor selec¬ 
tion (Eiben and Smith, 2003). Also, the EA used mutation 
only (there was no recombination operator). At each gen¬ 
eration, 5000 tournaments of ten genotypes were created. 
The winner of each tournament went on to produce a single 
child. The fittest 5000 of the 10000 genotypes composed of 
children and current generation’s population went on to form 
the subsequent generation’s population. Table 4 presents the 
EA parameters. The choices of selection operators as well 
as algorithm parameters were made as preliminary experi¬ 
ments showed that they were conducive to the emergence of 
evolvability. 

Experiments 

For each of the two network types, evolution was run twenty 
times for 750 generations for each of two different task tar¬ 
get pairs. Thus, four different evolutionary setups were each 
run twenty times. During evolution the goal was switched 
between the two targets of the pair every ten generations. 

During each evolutionary run, at the end of each gener¬ 
ation, if there was at least one genotype in the population 
with maximum possible fitness, then a test on the average 
robustness of such genotypes was run. That is, if at least 
one genotype had reached the current target, as specified 
in table 5, then such a test was run. This test consisted of 
randomly drawing 200 such genotypes from the population, 
allowing for a given genotype to be drawn multiple times, 
and then creating 15 mutated copies of this genotype, using 
the same EA mutation operator and parameters as specified 
in the Evolutionary Algorithm section. The average robust¬ 
ness of the maximum fitness genotypes was then computed 


as the proportion of these mutated copies which were also of 
maximum fitness. 

Results and Discussion 

Figure 1 presents plots of the maximum fitness, average fit¬ 
ness and robustness of maximum fitness genotypes averaged 
over the 20 runs for each of the four setups. In early gener¬ 
ations [0, 150] the average and maximum fitness respond 
more slowly to changes in the task target as compared to 
later generations [710, 750]. A further observation is that 
the average robustness of the maximum fitness genotypes 
that are present in the population increases over evolution¬ 
ary time. 

A useful statistic in this context is the correlation between 
the average robustness achieved after evolving towards one 
goal for the allotted ten generations and how quickly the 
population is able to adapt back to that goal after it has spent 
the next ten generations evolving towards the other goal. 
That is, we need to test the hypothesis that robustness and 
evolvability are correlated. In order to measure rate of adap¬ 
tation, the average fitness two generations after the change 
back to the original goal was recorded. 

We measure average absolute fitness, rather than the rate 
of fitness change (relative fitness) as an indicator of a popu¬ 
lation’s evolvability. We elected to use absolute fitness, since 
measuring relative fitness would unfairly benefit genotypes 
whose fitness dropped the most after a change in the task 
environment. That is, given two populations, the one with 
the highest fitness a given interval after a task change is con¬ 
cluded to be the most evolvable. Also, this interpretation 
holds in the case that the other population suffers a greater 
fitness decrease after the task change, which might subse¬ 
quently lead it to having a faster fitness increase. 

Thus, for each of the four setups, the Pearson correlation 
(Freedman et al., 2007) was applied between the average ro¬ 
bustness of the maximum fittness genotypes and the average 
fitness early into the evolution back to the goal. Specifically, 
the correlation measure was applied between robustness at 
the end of each period (where genotypes evolve towards the 
first target of the pair), and the average fitness two genera¬ 
tions into the subsequent period evolving back towards this 
target. Results deriving from periods where no genotypes 
in the population were of maximum fitness (at the end of a 
period of evolution towards the first target) were excluded 
from the correlation computation. That is, robustness could 
not be calculated in these instances. It was found that in 
all four setups, there was a positive correlation between the 
robustness achieved and evolvability (p < 0.01). This sup¬ 
ports our hypothesis that robustness and evolvability, defined 
as adaptability, are concomitant phenomena. 

A further pertinent question was the structure of this ro¬ 
bustness across genotype space. It is plausible that evolution 
could have found an area of robust genotypes for both targets 
interspersed. 
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NAND Nodes, Target Pair One 



NAND Nodes, Target Pair Two 





Threshold Nodes, Target Pair One 



Threshold Nodes, Target Pair Two 

Figure 1: Plots of the population average and maximum fitness as well as the average robustness of maximum fitness genotypes 
averaged over the 20 runs for each of the four setups. The left column contains plots from early generations [0, 40], the middle 
from slightly later generations [110, 150] and the right from generations near the end of the run [710, 750]. 
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Alternatively it could be alternating between separate ar¬ 
eas of robustness where these areas are adjacent or separated 
by a greater distance and connected by a strong fitness gradi¬ 
ent. In order to ascertain this structure, visualizations of the 
genotype space were constructed using the t-SNE algorithm 
(Van der Maaten and Hinton, 2008). 

For each of the four setups, evolution was run for 210 gen¬ 
erations, where this number was chosen so that the popula¬ 
tion would be between robust regions (assuming their exis¬ 
tence). At the end of these runs, mutated copies of the fittest 
genotype in the population were created using the same EA 
mutation operator and parameters as specified in the Evolu¬ 
tionary Algorithm section , although all mutation probabil¬ 
ity parameters were increased by a factor of two. This was 
because preliminary testing found smaller mutation proba¬ 
bilities caused the generation of sufficient target phenotypes 
to be too computationally expensive. 

These mutated copies were created until there were 300 
distinct genotypes which coded for each of the two target 
phenotypes. These genotypes were subsequently fed into the 
t-SNE algorithm which arranged them in a two-dimensional 
space to preserve distance in the higher dimensional (visual¬ 
ized) space. That is, genotypes which were closer together 
in the genotype space, formed clusters in the visualization. 

The metric used in the t-SNE algorithm was the Hamming 
distance between genotypes. This is because two networks 
which differ only in the wiring of one connection should al¬ 
ways be considered to be the same distance apart, regardless 
of the numerical identification values assigned to the nodes. 
A similar argument can be made for weights and thresholds. 

The resultant visualizations are displayed in figure 2. 
When NAND networks were being used on target pair one 
there was a visible separation between the two robust re¬ 
gions. However, there was less separation in the other runs. 

In order to gain a more quantitative understanding of this 
separation, the silhouette score (Rousseeuw, 1987) for each 
data set was computed. The silhouette score is a measure of 
the efficacy of a clustering algorithm. Scores are in the range 
[—1,1] and a positive score indicates distinct clusters, a neg¬ 
ative score indicates data points being placed in the wrong 
cluster and a score of zero indicates overlapping clusters. 
Thus, in this instance, the silhouette score measured the ef¬ 
ficacy of the clustering of the genotype space, where corre¬ 
sponding phenotypes were the labels used in the silhouette 
score calculation. 

In order to facilitate statistical comparisons, this process 
was run 20 times. The results of these runs are displayed in 
table 6. The score for the NAND networks on target pair one 
is larger than for the other setups. Furthermore, this differ¬ 
ence is statistically significant (p < 0.01, Mann-Whitney U 
test with Bonferroni correction (Flannery et al., 1986)). 

The results indicate that an increase in evolvability, is con¬ 
comitant with an increase in the robustness of genotypes 
coding for the target phenotypes. This can be observed in the 


plots of fitness and robustness displayed in figure 1, where 
rapid adaptation to new targets coincided with increased ro¬ 
bustness. The link between evolvability and robustness is 
further supported by a statistically significant correlation be¬ 
tween evolvability and robustness. 

These results also support the hypothesis that increased 
evolvability is driven by biases in regions of genotype space 
(Hogeweg, 2012; Watson et al., 2014) and the inference that 
a strong bias towards certain genotypes will increase the ro¬ 
bustness of these genotypes. Furthermore, finding a geno¬ 
type within a given region will be aided by an abundance 
of this genotype. Thus, as it can be argued that evolvability 
implies robustness and robustness implies evolvability, the 
position of this paper is that the two phenomena are linked 
and occur in tandem. 

Visualizations of the genotype space, as shown in figure 
2, demonstrated that the nature of these biases can be non¬ 
trivial. That is, evolution does not necessarily settle on a 
region which is uniformly biased towards both of the target 
phenotypes. Although this case was observed, the corre¬ 
sponding case of adjacent regions, each biased to one of the 
phenotypes was also seen. The implications of these visual¬ 
izations were supported by the significant difference in the 
silhouette scores between these cases. 

Furthermore, the difference between these cases had 
a strong influence on the evolutionary dynamics, with a 
greater non-uniformity of the bias corresponding with larger 
drops in population fitness during task target changes. 

It is theorized that the separation of the robust regions in 
the evolution of NAND networks on target pair one is due to 
the fact that, in that setup, phenotypes of either target are less 
likely to occur in close proximity. Nodes in the NAND net¬ 
works had fewer possible connections than those in thresh¬ 
old networks (tables 1 and 2). Moreover, their thresholds 
and connection weights were not subject to mutation. This 
meant that the genotype space was of a much lower dimen¬ 
sion, reducing the chance of two given phenotypes being 
nearby. Furthermore, the targets in pair one were more dis¬ 
similar. Should this hypothesis be correct, the implication is 
that the evolution of robust boolean networks is facilitated 
by a high dimension genotype space. 

Moreover, these observations have led to further research 
questions that are the subject of current work. For example, 
confirmation of the above hypothesis, how large the distance 
between separated regions can be and whether analogous 
structures emerge when evolution fluctuates between larger 
numbers of task targets. We are also investigating the rela¬ 
tionship between these results and improvements in adapta¬ 
tion, facilitated by robustness, increasing the accessibility of 
the genotype space (Draghi et al., 2010). 

Conclusion 

This study demonstrated that, in the simulation of evolv¬ 
ing boolean networks in a fluctuating task environment, an 
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Figure 2: Visualizations of genotype space in the region of the best genotype at the end of an evolutionary run. Red dots are 
for genotypes that code for task target one and blue dots are for genotypes that code for target two. The positioning of the dots 
was determined using the t-SNE algorithm (Van der Maaten and Hinton, 2008) which aims to preserve the distance between the 
dots in the higher dimensional space. The left column contains visualizations based on target pair one and the right for target 
pair two. The top row contains visualizations for NAND networks and the bottom for threshold networks. 


increase in evolvability was concomitant with an increase 
in the robustness of genotypes coding for the target pheno¬ 
types. These results support the hypothesis that evolvability, 
defined as adaptability, is driven by biases in the genotype 
space, however, visualizations of the genotype space showed 
unexpected structure in the nature of these biases. It was 
found that instead of biases towards different task targets 
occurring in the same region, in some instances they were 
separated into adjacent regions. These results contribute to a 
growing body of work (Wagner, 2008; Ciliberti et al., 2007a; 
Draghi et al., 2010) demonstrating the process by which ro¬ 
bustness facilitates evolvability. However, elucidating the 
exact nature of the robustness, evolvability relationship re¬ 
mains the subject of ongoing research. Moreover, the impact 
of experiment parameters, notably the frequency of environ¬ 
ment change, are yet to be determined. 
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Extended Abstract 

This work investigates evolvability of continuous-time recurrent 
neural networks to support the behavior of model-agents subject 
to fitness criteria that changes over the evolutionary timescale. A 
population of agents is altematingly evolved to perform two 
tasks with inverted fitness awards. Evidence of evolvability is 
reported; it is shown that the population locates a region of 
"meta-fitness" in the landscape in which sub-regions of 
optimality for each task are easily accessible from one another. 

This work investigates evolvability of continuous-time 
recurrent neural networks (CTRNN) to support the behavior of 
model-agents subject to fitness criteria that changes over the 
evolutionary timescale. In this way, two broad thrusts of 
interest in recent computational approaches to artificial life and 
cognitive science are synthesized. On the one hand, 
evolvability is a hot topic of interest in and of itself - what kinds 
of developmental and phenotypic characteristics enable 
organisms to successfully adapt to changing environments? 
This question was raised by Richard Dawkins (1989), and has 
since received attention in paradigms ranging from binary 
circuits, discrete-time feed forward neural networks and toy 
developmental scenarios (Kasthan and Alon, 2005; Kovitz, 
2015). 

On the other hand there is work motivated by the joint 
perspectives of situated and embodied cognition, in which 
CTRNNs are embedded in agents with basic sensory-motor 
capacities. These agents are themselves embedded into 
dynamical environments, and genetic algorithms are used to 
evolve their nervous systems for minimally cognitive behavior , 
a term borrowed from Randall Beer to indicate “the simplest 
behavior that raises issues of genuine cognitive interest” (Beer, 
1996). This methodology facilitates an investigation of 
cognition as it manifests in adaptive behavior occurring in 
dynamically coupled brain-body-environment systems. 
Furthermore, using stochastic search methods to configure 
nervous systems supporting cognitively interesting behaviors 
minimizes a priori assumptions about the kinds of cognitive 
and representational capacities an agent needs to support said 
behaviors. 

More specifically, this work is inspired by two projects on 
either side of the aforementioned motivational fault line. 
Kashtan and Alon (2005) demonstrated that evolvability (via 
modularity) can be achieved in a connectionist network by 
altematingly evolving it to perform two separate but related 


tasks over many epochs. Their work was conducted with feed 
forward binary networks evolved to perform discrete logical 
operations and association tasks. Motivated by the joint 
perspectives of embodied and situated cognition, we apply a 
similar methodology to a model-agent controlled by a CTRNN. 
Here the different tasks correspond to cognitively interesting 
behaviors carried out in a dynamical environment. 

This evolvability study extends an object discrimination task 
first developed by Beer (1996), in which an agent with an array 
of distal sensors is required to distinguish circles from lines - 
moving towards the former and away from the latter in a 
simulated 2D environment. As in previous work (Beer, 1996), 
fitness is assigned on a scale of 0-100%, where 50% 
corresponds to a random solution and 100% is assigned to an 
optimal solution. Here, a population is evolved to perform this 
task (Task A) until the best individual reaches a fitness 
threshold of 80%, at which point the fitness criteria is reversed 
so that agents have to move towards lines and avoid circles 
(Task B); these fitness reversals will henceforth be called 
swaps. Note that Task A and Task B are mutually exclusive - 
an agent with high fitness for one of them has proportionately 
low fitness for the other. The population is evolved in this 
alternating fashion, which we refer to as evolutionary 
swapping , for 2500 generations. 

To benchmark success of evolutionary swapping, both tasks 
were first evolved for in isolation. For each task, 20 
evolutionary runs were performed over 500 generations with a 
population of 240. Across all 20 runs, the average value for the 
fitness of the best individual was 79.1% in Task A and 70% in 
Task B, suggesting that Task B is more difficult than Task A. 
More importantly, this indicates that the 80% threshold used in 
our swapping paradigm is non-trivial - in 500 generations only 
8 of 20 runs in Task A produced agents exceeding this 
threshold, while no agents reached 80% fitness on Task B. In 
line with this, out of 20 evolutionary swapping runs, only 8 
achieved one or more swaps. This is not so surprising, as the 
first swap corresponds to achieving 80% on Task A given a 
random starting population, which we know is not guaranteed 
to occur. 

While there is not enough positive data to say that in general 
the swapping methodology makes it easier to reach the fitness 
threshold, particular swapping runs do show clear evidence of 
evolvability. Figure 1 shows the fitness trajectory for one such 
run. This trajectory is best characterized as periods of ascent up 
a fitness gradient up to a peak, at which point fitness suddenly 
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drops off. Peaks correspond to the fitness threshold (indicated 
by the dotted black line) being reached; the subsequent drop- 
offs are a result of task swapping - recall that fitness measures 
of each task are inversions of one another, so we should expect 
a population with high fitness with respect to one task to have 
a sudden drop in fitness when tasks are swapped. 12 swaps were 
achieved (marked by dotted red lines), the first 4 are labeled for 
the sake of discussion. 

The horizontal distance between successive swaps 
corresponds to the number of generations taken to reach the 
fitness threshold for one of the tasks starting with the 
population of the previous swap (i.e. of the opposite task); this 
will henceforth be called the swapping interval. To the extent 
that evolvability is achieved in this paradigm, the swapping 
interval should be less than the number of generations it takes 
to reach 80% when either task is evolved for in isolation. We 
would also expect swapping intervals to generally decrease 
over the course of a run. Both of these properties are 
demonstrated in Figure 1. Starting from a random population, 
80% fitness with respect to Task A is reached after 245 
generations (A7). It then takes 1408 generations to achieve 80% 
fitness on Task B (. Bl ), followed by a 198-generation swapping 
interval for the second evolution on Task A (A2), and then a 45- 
generation swapping interval for the second evolution on Task 
B (. B2 ). The average swapping interval for the 8 remaining 
swaps is about 73 generations. That these short swapping 
intervals occur successively marks a huge improvement over 
evolving for either task in isolation, where for the most part the 
80% threshold is not achievable in 500 generations. 

How did such a dramatic increase in evolutionary efficiency 
occur? One possibility is that a highly diffuse population, 
which simultaneously contains high fitness individuals for both 
tasks is generated. This would result in small swapping 
intervals during which the population could remain relatively 
static. Alternatively, it could be that the population locates an 
area of “meta-fitness” in the landscape, in which sub-regions of 
optimality for each task are easily accessible from one another. 
This would allow for the population to cluster around a region 
of optimality for a given task and then quickly transition to a 
region of optimality for the other task in the following 
swapping interval. These two explanations are not mutually 
exclusive, but it is useful to differentiate between them, as one 
could be more at play than the other. 

To help answer this question, principal component analysis 
(PCA) was performed on the 47-dimensional set of all 
genotypes in populations at A7, Bl, A2 and B2. Figure 2A 
shows these populations in a reduced 2D space. The first thing 
to notice is that populations at each swap are clustered around 
roughly distinct regions of parameter space (A2 and B2 appear 
to be overlaid but on closer examination it becomes evident that 
they are clustered in distinct regions, see Figure 2B). This 
provides support for the second explanation, that the population 
as a whole is moving through parameter space during swapping 
intervals, as opposed to a static and highly diffuse population. 
Furthermore, the proximity of the population clusters to one 
another sheds light on the swapping intervals displayed in 
Figure 1. The long duration of Al-Bl corresponds to a large 
distance in parameter space separating A7 and Bl , while the 
relatively close proximity of Bl to A2 is consistent with the 
short evolutionary duration of B1-A2 necessary to make such a 
transition in parameter space. Following in this manner, A2 and 


B2, which have the shortest reported swapping interval, also 
appear to be closest in parameter space. 

Thus it appears that the search found a region of meta-fitness 
in the landscape, in which sub-regions of optimality for either 
task exist closely to one another. Presumably the close 
proximity between these sub-regions translates to important 
structural relations between agents with high fitness on either 
task. Identifying what these relations are, and investigating how 
they underpin the dynamical behaviors conducive to success in 
each task constitutes promising work for the future. 



Figure 1: Evolutionary swapping run. Red-dotted vertical lines 
mark task swaps, the black-dotted horizontal line represents the 
80% fitness threshold. The first 4 task swaps are successively 
labeled A1,B1,A2 and B2. 
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Figure 2: Populations at task swaps projected onto reduced 2D 
spaces. Figure 2A depicts populations of A1 (blue), Bl 
(yellow), A2 (green) and B2 (red) in a 2D space obtained by 
from a PC A on their union. Figure 2B depicts A2 (blue) and B2 
(yellow) in a separate 2D space, obtained from a PCA on their 
union. 
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Introduction The question of how altruism evolves despite 
selfish behaviors benefitting agents has for many years been the 
central theme in the fields such as biology, ecology, sociology, 
economy and psychology. Several mechanisms promoting the 
evolution of cooperation have been proposed and can be 
broadly classified into two categories: direct fitness benefits or 
indirect fitness benefits (West, 2011). The former covers mutu¬ 
ally beneficial cooperation, profiting both the actor and the re¬ 
cipient of the behavior (hence not truly altruistic). Altruistic co¬ 
operation is explained by the latter: by helping a close relative 
reproduce, an individual is still passing copies of its genes on 
to the next generation, albeit indirectly. Two mechanisms allow 
related individuals to interact: kin discrimination and limited 
dispersal. 

The prisoner’s dilemma game has been commonly used in 
computational studies on the emergence of altruism. Over the 
past two decades, many researchers investigated its behavior on 
networks. The two main properties of the network structure 
were found to favor survival of cooperators. The first one is 
clustering coefficient. The success of cooperative behavior is 
maintained by local interactions within a spatial structure, be¬ 
cause cooperators can survive and grow only if they form clus¬ 
ters (Nowak and May, 1992). However, an inverse relationship 
between the formation of clusters and the success of coopera¬ 
tion has also been reported (Hauert and Szabo, 2005). The se¬ 
cond network property is degree heterogeneity. Cooperation 
has been shown to emerge around the largest hub (Pacheco and 
Santos, 2005). 

While it seems that the riddle of cooperation has been largely 
elucidated as above, recent empirical research brings a more 
refined view of cooperation as not just a single, homogeneous 
trait but several different traits with different costs, benefits and 
contexts. Warneken and Tomasello (2009) investigated three 
types of altruism by comparing behaviors of children and chim¬ 
panzees. The three types were: helping (when agents help oth¬ 
ers achieve their goals), sharing (when agents share valuable 
goods such as food with others) and informing (when agents 
share with others things the others need or want to know). They 
found that although chimpanzees help others instrumentally, 
they are less likely to share resources at their own expense. 
Also, they do not share information helpful to others. However, 
both infants and young children were observed to be helpful, 
generous, and informative. Thus, the authors suggest that shar¬ 
ing and informing are types of altruism specific to humans. 

Based on the above, we have proposed the distribution di¬ 
lemma game (DDG) (Ueno and Arita, 2015) that aims to 
model the altruism in the distribution of resources (material 


goods or information), but can also capture the emergent prop¬ 
erties of resources. Specifically, DDG can describe both how 
the total value of a resource is changed when it is distributed 
among agents and how the value is changed synergistically 
when an agent owns different kinds of resources. Our prelimi¬ 
nary study investigated the behavior of an evolutionary model 
with DDG on a one-dimensional torus and observed the emer¬ 
gence of altruism in certain scenarios. In this paper, we extend 
the study by investigating the effects of more realistic interac¬ 
tion networks on the emergence of altruism. Specifically, we 
focus on the behavior of DDG on small-world network topol¬ 
ogy. 

The model Agents are on a network, which is generated us¬ 
ing the Watts-Strogatz model. DGG is composed of a repetition 
of a resource distribution step and a strategy imitation step. In 
the distribution step, each agent distributes a unit of its unique 
kind of resource equally among its neighbors and itself. Each 
agent has its strategy represented by an integer value S : the 
number of nearest neighbors to whom it distributes resource (0 
means the agent is selfish). The initial values are randomly set 
0 or 1. If S is larger than the number of direct neighbors, recip¬ 
ients are selected from the neighbors of neighbors (and further 
if necessary). 

In the imitation step, it will take over the strategy ( S ) of the 
neighbor who obtained the highest gain in the last distribution 
step. Mutation changes each S by 1 or -1 with a probability of 
0.01 during imitation. 

The gain of an agent (G f ), i.e. the resultant value of resource 
each agent owns at the end of a round is calculated using the 
following equations, with D and K being model parameters. D 
and K express the extent to which the overall value of one's 
resources is affected by the act of resource distribution. 


(XjF tJ )*(l + K * VO , 

(1) 

/ , \D 

— , 

(2) 

\1+S/ 

-'Z 1 j= 1 P(q i j)log 2 P(q i j) , 
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ZFiJ 

(4) 


Distributing property D (see Eq. 2) determines the type of 
resources being shared. D = 0 expresses that the resource is 
purely informational (each receiver obtains the entire copy), 
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Figure 1: The average number of recipients to whom agents 
distribute resources (corresponding to their degree of altruism) 
as a function of W-S network rewiring probability p and for 
different values of the distributing property coefficient D. The 
left (a, b) and right (c, d) graphs show the cases with K = 0 and 
K = 1, respectively. The upper (a, c) and lower (b, d) graphs 
show the cases with D = 0.2, 0.4, 0.6, and with D = 0.8, 1.0, 
respectively. 


whereas D = 1 expresses the resource is purely physical (each 
receiver obtains an equally divided). Furthermore, considering 
that the resource value can be changed not only by distribution 
cost but also depending on the number of the people who will 
share the resource in general, we treat various types of re¬ 
sources in the unified fashion by assuming D is a real number. 

Gathering property K allows us to model the synergistic ef¬ 
fect of owing different goods by using the idea of entropy. For 
example, a certain set of knowledge combined together may 
lead to the creation of a new idea. For values of K larger than 
0, the gain coming from received goods is increased beyond the 
value of its sum. 


Results We investigated the behavior of the system by vary¬ 
ing the distributing property of resource D between 0 (infor¬ 
mation) and 1 (material goods) and the gathering property K 
being either 0 or 1, where K = 1 corresponds to a maximum 
synergistic effect of gathering different resources. The network 
has N = 65 as the population size, an initial node degree 4 and 
a rewiring probability p (p = 0 is the regular network and p = 1 
is random network). Each game lasted 100000 generations and 
was replayed 100 times using a different random seed. 

For K = 0 and resources close to material goods (D = 0.8, 1) 
only a very limited propensity to share emerged only with low 
values of p (p < 0.1, Fig. lb), in other words, with high spatial 
locality. For resources having a property closer to the infor¬ 
mation side (D = 0.6, 0.4, 0.2), the most altruistic agents 
emerged for intermediate values of p. Interestingly, we found 
that this range of p values corresponds to the networks having 
the highest small-world-ness coefficient (cf. Fig. 2a), measured 
using the method of Humphries and Gurney (2008). 

In the scenario with synergistic resource effects (gathering 
property coefficient K = 1), the results for resources having a 
property closer to the information side were similar (Fig. lc). 
The agents, however, became on average even more altruistic. 
The situation was very different for resource closer to material 
goods. The truly altruistic agents now also emerged in networks 
having the highest small-world-ness coefficient (Fig. Id). 


70 Clustering coefficient 
-v Average path length 
' • Small-world-ness 





small world ness 


Figure 2: (a) Small-world-ness coefficient as a function of re¬ 
wiring probability p in the Watts-Strogatz model. Also shown: 
clustering coefficient and average path-length normalized by 
their values for regular lattice (p = 0). (b) A correlation of S and 
small-world-ness of K - 0, D - 0.2. 

We suspect that the small levels of altruism emerging in the 
scenario shown in Fig. lb can be explained by the mechanism 
of spatial locality: on regulator networks, close cooperators sur¬ 
vive by benefitting each other (cooperative clustering). In the 
small world networks, however, we think altruism emerges ow¬ 
ing to the degree heterogeneity (Pacheco and Santos, 2005). 
More precisely, the agent at the center of a hub has a high num¬ 
ber of partners for interactions, therefore, it can get higher gain 
than the agent with the lower number of neighbors (assortative 
interactions). This causes agents around hubs to imitate altruis¬ 
tic strategy of the hub’s central node, and potentially spreading 
the strategy further. 

Conclusion The correlation between small-world-ness and 
emergence of altruistic behavior observed in the DDG model, 
suggests that small-world networks may be essential for the 
emergence of altruism. Furthermore, we found that the gather¬ 
ing property of resource further strengthens the propensity of 
agents to behave in an altruistic way, which would otherwise 
not happen for resources that resemble material goods. We be¬ 
lieve that DDG model sheds new light on the importance of 
social network structure for the emergence of cooperation as 
well as the powerful effect of synergistic resource effects. 
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Abstract 

Social learning plays a key role in the evolution of cooperation 
in humans and other animals. It has also been shown both 
theoretically and experimentally that environmental adversity is 
also a key determinant of the evolution of cooperation among 
individuals. Here we investigate the impact of social learning 
on the evolution of cooperation in the context of a range of 
levels of environmental adversity. We used an agent-based 
simulated world of asexual individuals that communicate and 
play a probabilistic version of the Prisoner’s Dilemma game. 
We considered simulated worlds either with or without random 
spreading of the offspring and two variants of social learning, 
either copying to some extent all communication rules or 
copying fully some of the communication rules of the best 
performing neighbor individual. The results show that in the 
case of spreading of the offspring, social learning increases the 
level of cooperation and reverses the association between this 
and the level of environmental adversity, i.e. low adversity with 
social learning implies higher level of cooperation. Copying 
fully some communication rules also increases the steady-state 
level of communication complexity in the simulated agent 
communities. The results suggest that the level of cooperation 
in communities of individuals may get boosted alternately by 
highly adverse environments and by layers of social learning in 
low adversity environments. 

Introduction 

The emergence and evolution of cooperation among 
individual humans and animals is a fundamental question of 
social evolution (Axelrod, 1997). In general it is assumed that 
cooperation emerges either because of kin selection, or direct 
or indirect reciprocation, or because of some form of social 
clustering or due to the group level selection of groups with 
more cooperating individuals (Rand and Nowak, 2013). It has 
been also shown that social factors, such as enforcement of 
rule following, also contribute significantly to the evolution of 
cooperation (Sigmund et al, 2010). 

An external factor that has critical influence on the level of 
cooperation is environmental adversity, which includes both 
the harshness of the environment (i.e. scarcity of resources) 
and the variability of the environment (i.e. the variability and 
extent of the lack of predictability of the level of available 
resources) (Andras et al, 2007; Andras et al. 2003). 
Theoretical, simulation and real world experimental results 
confirm that in general higher level of environmental 
adversity implies higher level of cooperation in communities 


of individuals existing in the presence of such environmental 
constraints (Krams et al, 2010; Spinks et al, 2000; Rand et al, 
2014; Potts and Faith, 2015). 

Social learning in general means that individuals within a 
community adapt their behavior such that they follow 
behavioral patterns of other individuals (Flinn, 1997). The 
other individual may be chosen on various grounds, for 
example it can be the most successful individual in some pre¬ 
defined sense or it can be the oldest neighboring individual. It 
has been suggested that social learning supports cooperation 
in communities of individuals, especially in the context of 
humans (Boyd and Richerson, 2009), however, there are also 
claims of the opposite effect in the relevant literature (Heyes, 
2013). 

The level of cooperation can be measured directly in 
experiments (both in the case of agent-based simulations and 
in the real world). Having additional measures of correlates of 
cooperation is useful to understand better the context of the 
measured level of cooperation. One such measure is the 
communication complexity of the interactions between the 
individuals of the community (Andras, 2008). 
Communication complexity decreases with increased 
environmental adversity and this contributes to the increase in 
the level of cooperation (Andras, 2008). 

Here we present the use of an agent-based simulation 
which implements communicating agents that play prisoner’s 
dilemma games (Axelrod, 1997) to investigate the role of 
social learning in the context of evolution of cooperation in 
communities of selfish individuals. We considered two 
variants of the simulated world, one where the offspring of 
the asexual individuals do not get dispersed widely from the 
location of their parent, and another where the offspring is 
dispersed away from the location of the parent - in general the 
simulation without dispersal of the offspring have higher 
levels of cooperation at the steady-state level than the 
simulation with offspring dispersal (Andras et al, 2003; 
Andras et al, 2006). We also considered two variants of social 
learning, in one case the learner copies to some extent the 
communication rules of the most successful neighbor, in the 
other case the learner copies fully some of the communication 
rules of the most successful neighbor. The results show that 
copying fully some communication rules increases the level of 
cooperation more than the considered alternative social 
learning method. Both social learning methods have much 
more effect if the offspring are widely dispersed. Social 
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learning with full copying of some communication rules also 
reverses the association between environmental adversity and 
the level of cooperation, making the level of cooperation 
increase with the decrease of the environmental adversity. In 
addition, social learning with full copying of some 
communication rules leads to higher communication 
complexity in the simulated agent communities than the use 
of the other social learning method or simulations without any 
social learning. The results suggest that the level of 
cooperation in communities of individuals may get boosted 
alternately by highly adverse environments and by layers of 
social learning in low adversity environments. 

The rest of the paper is structured as follows. First we 
review briefly the relevant results from the literature. Next we 
provide a description of the agent-based simulation that we 
used. This is followed by the presentation of the detailed 
results. Finally the paper is closed by the discussion and 
conclusions section. 


Background 

There are several theories about the mechanisms behind the 
emergence and evolution of cooperation in communities of 
selfish individuals. Kin selection assumes that related 
individuals recognize each other on the basis of their 
similarity and the likelihood of their cooperation with their 
kin is high in order to support the success and spreading of 
the kin (Rand and Nowak, 2013). Direct reciprocity assumes 
that individuals are likely to reciprocate the cooperative help 
received from others and expect further reciprocation of 
cooperation by others who benefit from this (Rand and 
Nowak, 2013). Indirect reciprocity relies on the assumption 
that individuals observe the behavior of other individuals and 
they are more likely to cooperate with those who are seen to 
cooperate with others (Rand and Nowak, 2013). Group 
selection based mechanisms assume that individuals 
belonging to groups characterized by higher level of 
cooperation are more likely to survive and have offspring 
because their group has a better chance of survival as a group 
due to the benefits from the high level of cooperation within 
the group (Rand and Nowak, 2013). Other models rely on 
emergent population structure (e.g. spatial constraints) that 
drive cooperators together and exclude non-cooperators, 
giving in such way an advantage to the emergent communities 
of cooperators over other emergent communities not 
dominated by cooperators (Rand and Nowak, 2013). 

Environmental adversity is an important determinant of the 
emergence and evolution of cooperation (Andras et al, 2007; 
Andras et al, 2003). Theoretical analysis shows that higher 
environmental adversity (harsher environment or more 
variable environment) implies higher level of cooperation 
among individuals in communities that survive in high 
adversity environment (Andras et al, 2007; Andras et al, 
2003). Experimental results about a range of animals, humans 
and agent-based simulation results confirm this theoretical 
result, showing that indeed, exposure to higher predation risk 
or higher variability of environmental resources or risks lead 
to higher frequency of cooperative behavior between 
individuals (Krams et al, 2010; Spinks et al, 2000; Rand et al, 
2014). 


Theoretical and agent-based simulation analysis of the 
environmental risk conceptualized as the variance of the 
available resources shows that the experienced subjective risk 
is always bigger than the objective risk and the difference is 
bigger in harsher environments (Andras et al, 2007). It has 
been also shown that the effective risk after taking into 
account the effect of cooperation is always smaller than the 
subjective risk and that the effective risk is practically stable 
across a range of subjective risk levels and this stable 
effective risk is slightly increasing with the harshness of the 
environment (Andras et al, 2006). This implies that higher 
subjective risk perceived by individuals triggers more 
cooperation in order to bring down the level of the effective 
risk to the stable level of this. 

In the context of communicating individuals who negotiate 
before making the decision about cooperation the complexity 
of the communication language that they use contributes to 
the overall environmental risk. It has been shown through 
agent-based simulation studies that indeed communication 
complexity is lower in the case of higher external 
environmental risk (Andras, 2008). Note that the language 
complexity is measured in terms of the variability of the 
communication rules and not as the length of communication 
sequences preceding the decision on cooperation. This result 
implies that the communication complexity measure is a 
useful correlate of the extent of reduction of the effective risk 
through reduction of the unreliability of communications 
between individuals. 

Social learning plays a key role in organizing the social role 
of individuals in the context of their social environment 
provided by their community (Flinn, 1997). The essence of 
social learning is the copying or imitating the behavior of one 
individual by another individual. There are several 
mechanisms of social learning, some being context-dependent 
others being content-dependent, some are oriented towards 
specific individuals (e.g. richest, most successful, oldest, most 
similar) others are driven by frequency of behaviors (e.g. most 
frequent is copied) or by the state of individuals (e.g. 
experiencing high dissatisfaction) (Rendell et al, 2010). Social 
learning may work by copying a fully or partially a whole 
sequence of consecutive behaviors or by aiming to emulate 
the outcome of a sequence of behaviors, or by some 
intermediate variant of behavioral copying (Rendell et al, 
2010). Social learning may also be supported by enforcement 
of rules in various forms of punishment applied to individuals 
who do not conform to the rules (Sigmund et al, 2010). 

Agent-based simulations have been used to study various 
aspects of social learning (e.g. choice of social learning 
mechanisms) (Nakahashi et al, 2012; Seltzer and Smirnov, 
2015; Molleman et al, 2013). Such simulations usually 
implement a small range of alternative social learning 
mechanisms and analyze their impact on the behavior of the 
simulated agent community. 

The role of social learning in the context of emergence and 
evolution of cooperation has been considered in a number of 
settings. In general it is suggested that social learning is a key 
contributor to the evolution of cooperation among humans 
and possibly also among other animals (Boyd and Richerson, 
2009; Rendell et al, 2010; Chudek et al, 2013) It has been 
shown that in simulated social networks imitation of socially 
distant individuals increase the level of cooperation within the 
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agent community (Seltzer and Smirnov, 2015). Other agent- 
based simulation studies show that certain forms of social 
learning (e.g. conformism) reduces the level of cooperation in 
simulated communities (Molleman et al, 2013; Burton et al, 
2015). There are also more theoretical / conceptual 
investigations that question the level of contribution of social 
learning to the emergence and evolution of cooperation 
among humans (Heyes, 2013). 

Simulated Agent Communities 

The simulated world of agents is placed in a two dimensional 
space arranged as a torus in both dimensions and having the 
size of 1000 in both dimensions. The agents move randomly 
in this space in each turn (up to 5 units in both dimensions). 

Each simulation runs for 400 time turns. In each turn each 
agent picks randomly another agent from its spatial 
neighborhood to interact with. An agent is allowed to interact 
with only one other agent at any time and some agent may 
stay without interaction partner in some of the time turns. 

The agents own resources and they spend these to survive. 
If the resource amount of an agent goes below zero the agent 
dies. The agents use their current level of resources to set their 
level of resources in the next turn. They may also play a 
resource generation game with their interaction partner. 

The agents interact using a communication language 
consisting of the symbols: ‘0’,’s’,’i’,’y’,’n’,’h’ and ‘t\ The 
meaning of the communication symbols are as follows: ‘O’ - 
no intention of communication, ‘s’ - start of communication, 
‘i’ - maintaining the communication, ‘y’ - indication of the 
willingness to engage into resource sharing, ‘n’ - indication 
of no further interest in communication, ‘h’ - effective 
sharing of the resources, ‘t’ - not sharing the resources after 
an indication of willingness to engage into sharing. The last 
two symbols, TT and ‘t’ effectively mean the resource-sharing 
or no-resource-sharing actions of the agents. The generation 
of communication symbols by agents is determined by 
probabilistic communication rules of the agents. These rules 
are expressed as follows 

L: ( 1 ) 

Ucurrent’ U current >'pi Unew, 1 ’ > p2 Unew, 2’ • • • > > pk Unew, k 

where U cur rent is the current communication symbol produced 
by the agent, U’ current is the current communication symbol 
produced by the communication partner agent, U new>j is the j-th 
possible communication symbol that may be produced by the 
agent following the previous production of the symbol U current 
and the production of the symbol U’ current by the 
communication partner agent, and p t is the probability of 
producing U new>i the symbol. Naturally we have that p,-\- 
p 2 +...+ Pk = 1- For example, a communication rule can be the 
following 

L: (2) 
i’i’—>0.5 U —>0.2 y> —>0.3 n 

which means that after producing the symbol ‘i’ and receiving 
the symbol ‘i’ from the communication partner, with 0.5 
probability the agent will produce the symbol ‘i\ with 0.2 
probability the symbol ‘y’ and with 0.3 probability the symbol 
‘n’. 


An example of a sequence of communications between two 
agents is: %, s 2 , ii, i 2 , b, i 2 , yi, i 2 , yi, n 2 ’, where the indices are 
the identifiers of the two agents. If the communication process 
between two agents carries on for too long without reaching 
the production of the action symbols ‘t’ or ‘h’ (the length 
limit was set to 20 symbols), the communication terminates as 
it is considered too long for the time turn. The communication 
between two agents may also terminate if either of them starts 
by producing the ‘0’ symbol, if one of them produces the ‘n’ 
symbol, or if they both produce a ‘t’ or ‘h’ symbol. In the 
latter case the agents engage in a prisoners’ dilemma game 
where the outcome of the game depends on the actions of the 
involved agents, i.e. they cooperate if both of them produce 
the symbol ‘h’ otherwise one of them or both of them tries to 
cheat (by producing the symbol ‘t’). 

When the agents enter the playing of the prisoners’ 
dilemma game they jointly invest their available resources to 
generate new resources. The overall payoff of the game is the 
difference between the sum of the amounts of new resources 
that each agent would have without entering the game and the 
amount of resources that can be generated by using the 
combined current resources of the agents. If an agent cheats 
while the interaction partner is willing to cooperate the 
cheating agent takes the full payoff and the other agent gets 
no extra resources in addition to what it can generate by itself 
with its own available resources. If they both decide to 
cooperate they share the full payoff equally and this gets 
added to the amount of resources that they would generate 
individually. If both agents decide to try to cheat no extra 
resource is allocated to either of the agents. 

The generation of effective new resources is realized in a 
probabilistic manner. The actual value is picked from a 
uniform distribution where the mean value of the distribution 
is given by the calculated value of new resources and the half¬ 
width (equivalent of variance) of the distribution is given by 
the environmental risk level ( d) that characterizes the 
simulated world of the agents. Low environmental risk (low 
variance) means than the actual value of the new resource is 
close to the calculated mean value of the resource value 
distribution, while high environmental risk (high variance) 
means that the actual value may differ significantly from the 
calculated mean value (can be also much smaller and much 
larger). 

The agents have a memory of their most recent interactions 
with other agents (last ten interacting agents). The memories 
record the outcome of the interactions with these other agents 
and depending on the experience of the agent the probability 
of the resource sharing action of the agent is altered - it gets 
more likely to cooperate again with interaction partners who 
cooperated previously and less likely with those who cheated 
previously (i.e. the probabilities of the rule components 
y,y’-> p t and y,y’—> q h change - e.g. the latter gets bigger if the 
sharing gets more likely according to the past experience). 

The agents engage in social learning. They select the 
individual with the highest amount of resources in their 
neighborhood as target of imitation - the neighborhood 
consists of the 10 closest other agents. Two kinds of social 
learning approaches have been implemented. In one case the 
agents copy to some extent the communication behavior of 
the imitated agent by setting their communication rule 
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probabilities similar to the matching probabilities of the 
imitated agent. This is implemented as 

P revised(Ucurrent’ U current’ ^3 new) ~ (1 Tj) 'P original ( 3 ) 

(U current’ kl current’ U new) 0 * P imitated 
( Ucurrent’ current> ^new) 

where p(U current , U’ current , U new ) is the probability of generating 
the symbol U new by the agent after previously having 
generated the symbol U current and having received the symbol 
U’current from the communication partner, and 77 is the extent 
of the fidelity of the imitation. In the second social learning 
approach the agent copies fully some of the communication 
behaviors of the imitated agent. In this case 77 , the extent of 
the fidelity of the imitation, is the probability of copying for 
all communication rules L (i.e. includes the copying of all 
related probabilities). 

The agents have a limited life span (60 time turns at most in 
the simulations that are reported here - the agent start their 
life at a randomly set starting age that is at most 20). When 
they reach the end of their life they reproduce asexually, by 
generating potentially mutated offspring which inherit the 
communication rules with possible small changes to the 
relevant probabilities. The number of offspring depends on 
the resources available to the agent (p) at the time of death 
and it is determined by the equation 

n = [p- ((P- p me J/Ps,dev) + 7] (4) 

where p mean and p stdev are the mean and standard deviation of 
the resource across the whole agent community at the time 
when the offspring is generated and (3 and y are parameters, 
[.] is the integer part function (/3 =1.5, y = 7.5). We also 
capped the number of offspring, i.e. if n > n max then the 
number of offspring is n max (n max = 15). If the above 
calculation gives n < 1 then the agent has no offspring. 

The offspring of the agent may be spread closely around 
the location of their parent or may get widely dispersed in the 
full extent of the two dimensional world in which the agents 
exist. The first offspring location option may create clumps of 
cooperating agents, while the second option prevents this. We 
implemented both options of placing of the offspring of dying 
agents. 

More details about the simulated agent world described 
above can be found in Andras et al (2003), Andras et al 
(2006) and Andras (2008). The code developed in Delphi for 
the implementation of the simulated agent worlds is available 
on request from the author. 


Results and Analysis 

We considered the following six simulation scenarios: (I) 
partial copying of all rules without wide dispersion of the 
offspring; (II) partial copying of all rules with wide dispersion 
of the offspring; (IE) full copying of some rules without wide 
dispersion of the offspring; (IV) full copying of some rules 
with wide dispersion of the offspring; (V) no social learning 
and without wide dispersion of offspring; (VI) no social 
learning and with wide dispersion of offspring. For all 
scenarios with social learning we considered two variants with 
low and high levels of copying ( 77 ), i.e., 77 = 0.2 and 77 = 0.8. 
We ran 20 simulations for five levels of environmental risk (a 
= 0.1, 0.3, 0.5, 0.7, 0.9) for each variant of the simulation 



Figure 1. The evolution of the level of cooperation and the 
level of language complexity for agent communities without 
wide dispersion of offspring and social learning by partial 
copying of all language rules (scenario I): A) level of 
cooperation with 77 = 0.2; B) level of cooperation with 77 = 
0.8; C) level of language complexity with 77 = 0.2; D) level of 
language complexity with 77 = 0.8; where 77 is the level of 
copying. 

scenarios. All data shown in the figures are averages of 20 
simulation runs, the standard deviations are small and are not 
shown to not clutter the figures. 

In the case of scenarios without wide dispersion of 
offspring the starting size of the agent population is 1,800, 
while in the case of scenarios with wide dispersion of 
offspring the populations have 7,500 individuals at start. In 
scenarios with wide dispersal of the offspring the likelihood 
that an agent dies without offspring is higher than in scenarios 
with closely located offspring. Thus in scenarios with wide 
dispersal of the offspring the likelihood that a smaller agent 
population goes extinct is relatively high. For this reason the 
population size was increased in these scenarios. Simulations 
with larger population sizes take more time to run but do not 
influence the nature of the results presented here. 

We measured the level of cooperation (c) by calculating the 
percentage of agents that engage in a cooperation interaction 
(i.e. both agents communicate the symbol ‘h’ at the end of 
their interaction) among all agents in the current agent 
population. 

We also measured the complexity of the agent language. 
For this purpose we considered all language rules L r , r = 

1.. ..,R (in the presented agent world simulations we had R = 2 
language rules) and all corresponding probabilities p j r , j = 

1.. ..,k r and calculated the variance of the values for each of 
where K = E r=] R k r . This language complexity measure is 
inspired by the concept of Kolmogorov complexity (Li and 
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Figure 2. The evolution of the level of cooperation and the 
level of language complexity for agent communities without 
wide dispersion of offspring and social learning by full 
copying of some language rules (scenario El): A) level of 
cooperation with 77 = 0.2; B) level of cooperation with 77 = 
0.8; C) level of language complexity with 77 = 0.2; D) level of 
language complexity with 77 = 0.8; where 77 is the level of 
copying. 

Vitanyi, 1997) in the sense that more variable application of 
the language rules (higher variance of the corresponding 
probability values) requires a longer description of the 
language than the description of a language with the same 
number of rules but less variable application of the rules. 

We expect that allowing the agents to use social learning 
increases the steady-state level (i.e. after many time turns, 
when this level gets stabilized) of cooperation in agent 
communities due to the copying of successful neighboring 
agents who are expected to be the ones that often cooperate. It 
is also expected that social learning will reduce the 
complexity of the language across the agent community, again 
due to the copying of language rules between agents. 

First we considered the scenarios without wide dispersion 
of the offspring of the agents - scenarios (I), (III) and for 
reference also scenario (V). For both variants of social 
learning we analyzed the evolution of the level of cooperation 
and of the communication complexity for low and high levels 
of behavioral copying. The results are shown in Figures 1 and 
2. These confirm that in both cases of social learning the 
steady-state level of cooperation grows with the level of 
environmental risk similar to previously reported results 
(Andras et al, 2003; Andras et al, 2007). Also, similarly to 
previous results (Andras, 2008) the results show that the 
steady-state language complexity decreases with the 
environmental risk. 

Increased level of copying in social learning leads to 
smaller differences in terms of steady-state levels of 



Figure 3. The evolution of the level of cooperation and the 
level of language complexity for agent communities without 
wide dispersion of offspring and social learning by partial 
copying of all language rules (scenarios I and V): A) level of 
cooperation with o = 0.3; B) level of cooperation with a = 
0.7; C) level of language complexity with a= 0.3; D) level of 
language complexity with a = 0.7; where <7 is the level of 
environmental risk. 

cooperation for different levels of environmental risk, for 
example the differences for a - 0.7 and a - 0.9 become 
statistically not significantly different for both kinds of 



Figure 4. The evolution of the level of cooperation and the 
level of language complexity for agent communities without 
wide dispersion of offspring and social learning by full 
copying of some language rules (scenarios III and V): A) level 
of cooperation with o- 0.3; B) level of cooperation with a = 
0.7; C) level of language complexity with o- 0.3; D) level of 
language complexity with a = 0.7; where <7 is the level of 
environmental risk. 
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Figure 5. The evolution of the level of cooperation and the 
level of language complexity for agent communities with wide 
dispersion of offspring and social learning by partial copying 
of all language rules (scenario II): A) level of cooperation 
with r| = 0.2; B) level of cooperation with r| = 0.8; C) level of 
language complexity with r| = 0.2; D) level of language 
complexity with r| = 0.8; where r\ is the level of copying. 

social learning. On the other hand, increased level of copying 
in social learning leads to increased differences between the 
steady-state levels of language complexity associated with 
different levels of environmental risk. 

We also note that an impact of the social learning is that at 
the beginning (until over 120 time turns) the ordering of the 
language complexity levels associated with risk levels is 
reversed, i.e. low risk level implies low language complexity. 
In the absence of social learning the steady-state ordering of 
risk level associated language complexity levels is already 
established by around 80 time turns (see Figures 3 and 4). The 
time point, by which the steady-state ordering of language 
complexity levels emerges, changes with the level of copying. 
Interestingly in the case of social learning with partial copying 
of language rules, higher extent of copying implies delaying 
this time point, while in the case of social learning with full 
copying of some rules, the increase in the extent of copying 
makes this time point earlier. 

Next we compared the levels of cooperation and language 
complexity for different extents of copying in the two kinds of 
social learning for two fixed levels of environmental risk a = 
0.3 and a-0.7. The results are shown in Figures 3 and 4. 

The results indicate that social learning at small extent of 
copying does not change the level of cooperation. However, at 
lager extent of copying the impact is a statistically significant 
(t-test, p=0.05) increase in the level of cooperation. In terms 
of language complexity both kinds of social learning has a 
major effect in reducing earlier and by a considerable extent 
the level of language complexity. Interestingly this effect is 



Figure 6. The evolution of the level of cooperation and the 
level of language complexity for agent communities with wide 
dispersion of offspring and social learning by full copying of 
some language rules (scenario IV): A) level of cooperation 
with rj =0.2; B) level of cooperation with tj = 0.8; C) level of 
language complexity with 77 = 0.2; D) level of language 
complexity with rj = 0.8; where rj is the level of copying. The 
blue and red-purple lines stop early in B) and D) due to the 
early growth of the simulated populations beyond the 
population size limit. 

larger at lower level of environmental risk and in the case of 
social learning by partial copying of all language rules the 
increase in the level of copying reduces the reduction effect 
on the language complexity. 

Next we considered the simulation scenarios with wide 
dispersion of the offspring - scenarios (II), (IV) and (VI) for 
reference. The wide dispersion of the offspring reduces in 
general the level of cooperation in the agent communities, but 
the ordering of the levels of steady-state cooperation 
associated with levels of environmental risk remains the same 
as in the case without wide dispersion of the offspring in the 
case of agent communities without social learning. 

For both kinds of social learning that we implemented we 
found that the steady-state level of cooperation associated 
with levels of environmental risk do not follow the ordering 
pattern found without social learning or with social learning 
but without wide dispersal of the offspring. In the cases of 
social learning with wide dispersal of the offspring lower 
environmental risk leads to higher level of cooperation - the 
difference becomes statistically significant for higher extent 
of copying in the social learning. In terms of language 
complexity again the ordering of the steady-state levels is the 
reverse of the ordering that we found for scenarios without 
wide dispersion of the offspring. Lower environmental risk 
implies higher language complexity in the case of agent 
societies with widely dispersed offspring and either form of 
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Figure 7. The evolution of the level of cooperation and the 
level of language complexity for agent communities with wide 
dispersion of offspring and social learning by partial copying 
of all language rules (scenarios II and VI): A) level of 
cooperation with a - 0.3; B) level of cooperation with <j = 
0.7; C) level of language complexity with a- 0.3; D) level of 
language complexity with <j = 0.7; where a is the level of 
environmental risk. 

social learning that we implemented. The results are shown in 
Figures 5 and 6. 

The results show that higher extent of copying in social 
learning implies an increase in the steady-state level of 
cooperation for all levels of environmental risk for both kinds 
of social learning and this effect is stronger in the case of 
social learning with full copying of some language rules. 
Similarly, higher extent of copying in social learning increases 
the effect of environmental risk on the steady-state level of 
language complexity (i.e. the distinction between steady-state 
level of language complexity for high and medium level 
environmental risk becomes clearer). Again, the effect is more 
accentuated for the social learning with full copying of some 
language rules. We also note that the steady-state level of 
language complexity is lower for all levels of environmental 
risk in the case of social learning with partial copying of all 
language rules. The evolution of language complexity shows a 
wavy nature in all cases considered here, which is likely to be 
due to a generational effect (each generation of agents lasts 
for around 60 time units). 

Further, we considered again two fixed levels of 
environmental risk (a = 0.3 and a = 0.7) and compared the 
corresponding levels of cooperation and language complexity 
for different extents of copying in the two kinds of social 
learning. The results are presented in Figures 7 and 8. 

The results show that at lower level of environmental risk 
both kinds of social learning increase the level of cooperation 
relative to the case with no social learning. Notably even at 
higher levels of environmental risk, at the initial part of the 
evolution of the agent community the level of cooperation 
increases with the extent of copying in social learning. For 
both kinds of social learning, higher extent of copying leads to 



Figure 8. The evolution of the level of cooperation and the 
level of language complexity for agent communities with wide 
dispersion of offspring and social learning by full copying of 
some language rules (scenarios IV and VI): A) level of 
cooperation with a = 0.3; B) level of cooperation with a = 
0.7; C) level of language complexity with a- 0.3; D) level of 
language complexity with a = 0.7; where <7 is the level of 
environmental risk. The olive-green line stops early in A) and 
C) due to the early growth of the simulated populations 
beyond the population size limit. 

higher level of cooperation at both environmental risk levels. 

In terms of language complexity, again both kinds of social 
learning lead to a significant drop in comparison with the case 
with no social learning. This effect is much larger at the lower 
level of environmental risk. Higher extent of copying in social 
learning leads to smaller steady-state language complexity at 
the lower level environmental risk, at the higher level 
environmental risk the same effect is smaller. As we already 
noted, for both kinds of social learning higher level of 
environmental risk implies higher steady-state language 
complexity. The level of language complexity is lower for the 
social learning with partial copying of all language rules than 
for the social learning with full copying of some language 
rules for both considered values of extent of copying and for 
both considered levels of environmental risk. 


Discussion and Conclusions 

Our results show that in the simulated agent communities 
social learning has more effect on the level of cooperation and 
the level of language complexity at low level environmental 
risk than at high level of environmental risk. This difference is 
more accentuated in the case of simulations with wide 
dispersal of the offspring of agents. 

We found that low extent social learning does not increase 
the level of cooperation and in the case of high environmental 
risk this may even reduce the level of cooperation. The extent 
of social learning influences the level of language complexity 
in all cases. More social learning leads to lower language 
complexity quicker in the context of low environmental risk. 
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A very interesting result is that in the case of simulations 
with wide dispersal of the offspring adding social learning to 
the simulations reversed the ordering of levels of cooperation 
and language complexity associated with levels of 
environmental risk, compared to the case without social 
learning. There is no such effect if the offspring of the agents 
are not dispersed widely in the space where the agents live. 

The results suggest that social learning is most impactful in 
terms of supporting cooperation and reducing language 
complexity in the context of low environmental risk 
situations. High environmental risk situations support the 
emergence of relatively high level of cooperation and low 
level of language complexity even in the absence of social 
learning (Krams et al, 2010; Andras et al, 2007; Rand et al, 
2014; Potts and Faith, 2015). Thus it is possible that animal or 
human populations develop high level of cooperation in harsh 
and risky environments without relying much on social 
learning, and these populations get to even higher level of 
cooperation and lower level of language complexity as they 
move to less harsh and less risky environments. 

The results also suggest that social learning gets a much 
more significant role in communities where related 
individuals get dispersed widely in the community. In close 
knit communities where kin are likely to stay close to each 
other the simulation results suggests that the impact of social 
learning is mainly in terms of reducing the language 
complexity within the community. 

The observations based on the simulation data that social 
learning may reduce the level of cooperation or increase the 
level of language complexity in high risk environments, and 
that in general it may have little effect on the level of 
cooperation at small extent of social learning, suggest that 
social learning has the potential to reduce cooperation in some 
settings (especially high environmental risk situations). This 
fits well with some of the experimental observations and 
theoretical explorations about how social learning may 
influence negatively the disposition towards cooperation of 
humans (Molleman et al, 2013; Burton et al, 2015). 

In general the results presented here suggest that social 
learning and environmental risk may take alternating roles in 
driving animals and humans towards communities that rely 
increasingly on cooperation among individuals. High 
environmental risk is the first driver to higher level of 
cooperation in the community of individuals. Following a 
move to a low risk environment social learning takes over as 
driver toward more cooperation and lower language 
complexity. High level of cooperation in low risk environment 
combined with social learning may lead to the emergence of 
novel social structures that add new risks to the environment 
and also increase the language complexity (Boyce et al, 2012). 
This may lead to a new high risk environment which in turn 
facilitates further cooperation in the evolving community. 
Next, with the maturation of the previously new social 
structures the environmental risk may get reduced and the 
community may experience a new bout of increase in 
cooperation due to social learning. This way the evolving 
community may increase the level of cooperation and the 
extent of social institutions, in steps driven alternately by high 
environmental risk and social learning. The investigation of 
generation of novel social structures in agent-based 
simulations of communities will be part of future work. 
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Abstract 

Sexual reproductive behavior has a necessary social coordina¬ 
tion component as willing and capable partners must both be 
in the right place at the right time. It has recently been demon¬ 
strated that many social organizations that support sexual re¬ 
production can evolve in the absence of social coordination 
between agents (e.g. herding, assortative mating, and natal 
philopatry). In this paper we explore these results by includ¬ 
ing social transfer mechanisms to our agents and contrasting 
their reproductive behavior with a control group without so¬ 
cial transfer mechanisms. We conclude that similar behaviors 
emerge in our social learning agents as those that emerged in 
the non-social learning agents. Social learners were more in¬ 
clined towards natal philopatry. Social learners also evolved 
a culture of eusociality including reproductive division of la¬ 
bor. 

Introduction 

Sexual reproduction is a social behavior as two able partic¬ 
ipants must coordinate their behaviors as well as their posi¬ 
tions in time and space. This social coordination problem 
is solved by sexually reproducing species in many different 
ways. 

Some of the behaviors that enhance finding and attract¬ 
ing a mate include herding (Reynolds, 1987), philopa¬ 
try (Clutton-Brock and Lukas, 2012), assortative mating 
(Jiang et al., 2013) and eusociality. These behaviors can 
arise through social mechanisms or non-social mechanisms 
(Whiten and Ham, 1992). 

For instance, consider witnessing a herd of animals cross¬ 
ing a plane to drink water from the river. A well known 
explanation of the herd is that the animals in the herd follow 
simple social rules of cohesion, alignment and separation 
(Reynolds, 1987; Grimbaum and Okubo, 1994; Parrish and 
Edelstein-Keshet, 1999). These are social rules and social 
mechanisms because to follow these rules the agents must be 
aware of each other and make decisions based on informa¬ 
tion about others’ states. However, a non-social explanation 
might be that all the animals were getting thirsty in the sun 
and independently navigated around obstacles to the river. 
With this second explanation, the herd is maintained by the 


mutual response of the individuals in the herd with no need 
for social awareness or exchange of information. 

Prior work (from now on when we reference the prior 
work we mean the work in Marriott and Chebib (2015a,b)) 
showed that herding, philopatry, and assortative mating 
arose through non-social mechanisms of convergence and 
common descent. This work raised a few questions regard¬ 
ing the role that social interaction plays in many of these ob¬ 
served behaviors. These mating behaviors can be explained 
by both non-social and social mechanisms. In many cases 
the non-social solution is the simpler one, though it is likely 
that certain instances of these behaviors indeed rely on un¬ 
derlying social mechanisms. Further, it is not clear how 
these behaviors vary relative to the nature of the underlying 
mechanisms. 

We have augmented the non-social agents from the prior 
work with both individual learning and social learning capa¬ 
bilities. Our null hypothesis is that the same breeding struc¬ 
tures and organizations will be observed as before. How¬ 
ever, we expect that the new adaptive mechanisms will im¬ 
pact these organizations quantitatively and may lead to other 
organizations. Our current work can be compared directly 
to the prior work but because of implementation differences 
between the models we have also conducted control tests of 
our own. These control tests involve the same agents but 
under conditions in which social learning and/or individual 
learning is unavailable. 

Social Animals 

Animals display different levels of social behavior (Mich- 
ener, 1969). Social behavior often is centered on repro¬ 
ductive activities and caring for the young (Trivers, 1972). 
Some social animals can extend this social behavior to other 
activities like hunting, foraging, and grooming (Lovegrove 
and Wissel, 1988; Boesch, 1994; Creel and Creel, 1995; 
Nakamura, 2003). The highest categorization of social be¬ 
havior in animals is eusocial (Crespi and Yanega, 1995). 
Ants, bees, termites, and some mole rats are categorized as 
eusocial (Wilson and Holldobler, 2005). Humans, of course, 
are also very social and loosely fit under the eusocial defini- 
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tion (Nowak et al., 2010). 

On the opposite side of the spectrum are non-social an¬ 
imals. Non-social animals engage in the minimal amount 
of social activity required of a sexual reproductive species, 
which is to mate. After mating the mother lays her eggs or 
gives birth to her young and leaves them to fend on their 
own (Starck, 1998). All parental investment in child rearing 
comes prior to birth. 

To be classified as eusocial, animals must satisfy three 
conditions. Eusocial animals share responsibility for car¬ 
ing for their young, have reproductive division of labor, and 
have multi-generational communal cohabitation (and also 
philopatry in some definitions (Burda et al., 2000)). This 
means that parents live with adult children and older genera¬ 
tions help to care for their grandchildren and others. This, in 
particular, allows for sharing of learned information across 
generations. 

We can see that humans loosely fit into this definition. 
We certainly share responsibility for child rearing, and we 
certainly have multi-generational communal cohabitation. 
However, we do not have reproductive division of labor in 
the sense we normally think of. That is, we don’t have a 
single “queen” that births us all after breeding with a few 
privileged, male courtiers. However, many still like to apply 
the label eusocial to humans while some prefer to keep hu¬ 
mans in a category of their own. This debate is not critical 
to our discussion. 

Model 

Our simulation consists of agents in a random geometric 
network of resource sites. Each day the agents in a popu¬ 
lation expend energy to move from site to site, forage for 
resources, and engage in mating, learning and social learn¬ 
ing. Energy in the simulation normally corresponds to the 
time an agent can spend doing activities during a day but ex¬ 
cess stored energy is also used to reproduce. The resources 
gathered during a day determines the energy an agent has 
for activities in the next day. The net daily energy gain or 
loss determines whether an agent lives or dies (if the energy 
is depleted) and whether an agent is capable of reproduction 
(if stored energy exceeds a threshold). 

Agents in our simulation implement the dual inheritance 
model (Marriott and Chebib, 2014, 2016b). The dual inher¬ 
itance model is a model incorporating three modes of adap¬ 
tation: phylogenetic, ontogenetic, and sociogenetic. That is, 
agents engage in genetic evolution, individual learning and 
social leaning. Figure 1 shows how genetic and cultural in¬ 
formation are stored and transferred in the dual inheritance 
model. 

Genetic and cultural information are stored in separate 
information stores called genomes and memomes, respec¬ 
tively. Genomes are inert within the lifetime of an agent 
meaning they remain static and are not active in behavior 
selection. Their only purpose is to create memomes upon 



Figure 1: Dual Inheritance Model 


birth and to be replicated in reproductive events. When an 
agent is born its memome is created by copying segments of 
the genome. This is a process of development that results in 
the agents initial cultural information. 

Memomes are active in behavior selection over the life¬ 
time of an agent and they can also be altered through in¬ 
dividual learning and social learning events. The memome 
selects behaviors through interaction with the environment 
and these choices shape the phenotype of the agent. The 
phenotype is used for selection and thus an agent’s repro¬ 
ductive success and survival is dependent upon its memome, 
its genome, and its environment. 

The cognitive model of our agents is inspired by the pan¬ 
demonium model (Jackson, 1987; Franklin, 1997). The 
mind of a single agent in our simulation consists of many 
specialized sub-agents, or daemons, that compete for con¬ 
trol of the agent. In our model we call these sub-agents 
memeplexes and this internal competition is an evolutionary 
competition. 

Genome 

The possible behaviors of agents are encoded in their 
genomes as a path of resource sites in the network (our 
model extends the model from (Marriott and Chebib, 
2015a,b)). We consider each site in the path a gene in the 
genome. A gene has three components: gathering, non¬ 
gathering, and travel. At each site an agent has an opportu¬ 
nity to gather resources, perform non-gathering actions like 
breed, learn, and/or socialize, and finally must travel to the 
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next site. 

In our current model an agent always gathers resources 
when it visits a site. When not depleted a site will always 
return a fixed number of resources to an agent gathering at 
that site. The energy that an agent spends on gathering is 
determined by an agent’s strategy which is encoded in a gene 
corresponding to a site. The energy expended in gathering is 
always at least the number of resources gathered (one, two 
or three) and at most five. 

A gene for a site will also encode how much time (i.e. en¬ 
ergy) is spent performing breeding, learning or socializing 
actions. Typically the time performing these actions is con¬ 
siderably less than the time gathering resources. Actions at a 
site are performed in the following order: gathering, breed¬ 
ing, learning, socializing, and finally traveling to the next 
site. 

Agents can engage in sexual reproduction when they have 
an energy total above a reproduction threshold. In addition, 
they need to find another willing and able participant at the 
same site. If two agents are engaged in breeding actions at 
the same site at the same time and they both have sufficient 
stored energy then they will engage in sexual reproduction. 
If no mate is found then agents will wait until the next op¬ 
portunity to mate. If the breeding action is unsuccessful an 
agent must still spend the energy cost of the breeding action 
but not the cost of reproduction. 

A genome is passed to the offspring during reproduction. 
In this process, the two parents’ genomes are recombined 
into a single genome and this genome is given the opportu¬ 
nity to mutate. A genome remains inert during the lifetime 
of an agent other than during reproductive events. 

At birth each agent’s genome creates a memome. A mem- 
ome consists of a set of memeplexes. Each memeplex in 
our model represents a possible set of activities for a day. 
Memeplexes are formed by copying segments of a genome. 
Starting at each gene (i.e. site) in a genome we copy gene 
by gene (site by site) into a memeplex (called memes in a 
memeplex instead of genes). This continues until the total 
energy of a segment approaches the maximum energy avail¬ 
able to an agent for a single day. If copying the next gene 
would exceed the maximum energy the segment is complete 
and is stored as a memeplex. This means each memeplex 
represents a possible set of actions for our agent in a single 
day. A memeplex is stored in a memome along with other 
memeplexes starting with the same initial site for behavior 
selection (see below). 

A single memeplex is formed starting at each gene in a 
genome. Copying occurs in the forward direction until the 
maximum energy is reached. If the end of a genome is 
reached copying continues backwards until the maximum 
energy is reached. Additionally, segments are copied in a 
backwards direction from every gene. This means every 
gene in a genome is responsible for two memeplexes in its 
memome except the endpoints that are responsible for only 


a single memeplex. Notice that since every site in the en¬ 
vironment is not necessarily represented in a genome there 
may be sites that do not have corresponding memeplexes. 

Memome 

As mentioned above memeplexes in a memome are arranged 
by starting site. The model bears similarities to the MAP- 
elites strategy of multi-objective evolutionary optimization 
(Mouret and Clune, 2015). For each site in the environment 
a memome will contain zero or more memeplexes that start 
at that site. Each memeplex will consist of a path of gath¬ 
ering sites that begin at a particular site and encode a path 
that takes at most the maximum energy (time) available to 
an agent for a single day. This means each memeplex en¬ 
codes a possible course of actions for an agent for one day 
starting at a particular site. From the perspective of the pan¬ 
demonium model, each memeplex is a daemon representing 
a single day’s activities. 

At the beginning of each day an agent must select a 
memeplex that will serve as its behavioral plan for that day. 
Only the memeplexes that begin at the current site are pos¬ 
sible and so only these memeplexes are considered when 
selecting behavior. Since memeplexes in a memome are or¬ 
ganized by starting site behavior selection begins by activat¬ 
ing all memeplexes that start at an agent’s current site. The 
agent selects the memeplex from this set that will maximize 
expected resource gain while minimizing expended energy. 
This memeplex is then used to determine the actions of the 
agent for that day. These actions interact with the environ¬ 
ment to reward the agent with resources which serves as a 
selective force on the agent. 

A memome is not only active in behavior selection but 
also adaptive over the lifetime of an agent. It serves as 
the information storage for both individual and social learn¬ 
ing. This means that additional memeplexes are added to 
its memome as the agent interacts with its environment and 
other agents. 

If the memeplex determining an agent’s actions contains 
a meme with a non-zero learning component, then the agent 
will engage in learning during its day. During individual 
learning we apply an evolutionary process. We clone and 
mutate the memeplex selected for this day’s actions and add 
it to the agent’s memome. 

Social learning is handled in the same way as breeding in 
our agents. If an agent wants to engage in social learning 
it must spend time seeking a partner for exchange. If an¬ 
other agent is also seeking a social learning partner at the 
same time and at the same site, then the two agents engage 
in an exchange of memeplexes. Each agent copies and ex¬ 
changes the memeplex they used for that day. In transfer, 
memeplexs are mutated so noise is added to the system in 
this step. The social learning mechanism allows agents to 
pass their learned memeplexes on to others in the popula¬ 
tion. Agents are only allowed a single transfer in a day. 
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Experimental Setup 

Simulations are run under two conditions: experimental and 
control. We ran both conditions 130 times in a variety of 
different environments. The experimental simulation con¬ 
sists of a population of agents with all of the mechanisms 
described above. The agents in this group are called social¬ 
izers. There were initially two control populations, one in 
which agents can learn individually but not socially (where 
agents are called learners ), and one in which agents can nei¬ 
ther learn individually nor socially (and where the agents 
are called breeders). Learners did not perform significantly 
differently from breeders in our experiments so our analysis 
will focus on differences between socializers and breeders. 

Each simulation run in our current experiment begins with 
a population of one hundred agents with randomly gener¬ 
ated genomes. The genes generated at random in this ini¬ 
tialization phase have a chance of having a non-zero breed¬ 
ing component to allow for the initial population to breed. 
However, all learning and social learning components of 
randomly generated genes are set to zero. They can only 
become non-zero through mutation. 

Simulation runs are seeded with one hundred randomly 
generated individuals. Since they are randomly generated 
any genetic similarity occurs by chance alone. As shown 
in prior work it is a rare but fortunate occurrence if two 
randomly generated agents end up performing breeding ac¬ 
tions at the same site at the same time (Marriott and Chebib, 
2015b). Since this is the criteria for breeding in our model 
this means our initial population has a risk of not being vi¬ 
able. A run is seeded with one hundred agents so that this 
chance is diminished. Under these settings every seed pop¬ 
ulation for a simulation run was viable. 

In every run a small proportion of the initial population 
are fortunate enough to reproduce. Once initiated these 
colonies tend to quickly become viable due to forces of com¬ 
mon descent. That is, since the offspring of these reproduc¬ 
tive events are related to their parents there is a high chance 
that they will perform similar behaviors, and importantly, 
breed at the same time and place as their parents and others 
in their genetic family. This helps make the fledgling colony 
viable. 

Each run is allowed to run for 5000 days. During the run 
some data was gathered continuously and other data was 
sampled every 50 days. For every reproductive event, re¬ 
latedness of the parents is measured along both genetic and 
phenotypic lines. This allows us to monitor the degree of 
genetic and phenotypic assortative mating present in a run. 
Further, whenever an agent dies, the number of offspring an 
agent had is recorded as well as other characteristics of the 
agent’s breeding history (like breeding sites and breeding 
partners). 

In addition to this continuously gathered data the popula¬ 
tion is also censused every 50 rounds. During censusing, the 
population size is recorded as well as many statistics from 



Figure 2: Average and maximum generation of active 
memeplexes over time in socializers with one standard devi¬ 
ation around the mean. 

every agent alive during that day including: the agent’s age, 
genome length, generation, memeplex generation, as well 
as the concentration of breeding/learning/socializing com¬ 
ponents in the agent’s genome and memome. 

Observations and Discussion 

We indeed saw evidence of herding, assortative mating and 
philopatry as expected. In some cases, there are quantita¬ 
tive differences of note. However, the more exciting results 
is the emergence of eusociality in our socializers. Before 
discussing eusociality let’s review some evidence of social 
learning in our agents. 

Social Learning and Cultural Evolution In order to 
track social learning and cumulative cultural evolution we 
assigned each memeplex a generation. Initial memeplexes 
created at birth are assigned generation zero. When a meme¬ 
plex is cloned the new memeplex has a generation one 
greater than its parent. Cloning occurs only during learning 
and social learning. 

Breeders never clone their memeplexes and so they al¬ 
ways act on memeplexes of generation zero. Learners can 
increase their memeplex generation but cannot share this 
with others. Socializers clone their memeplexes in individ¬ 
ual learning and in social learning. In order to test that cu¬ 
mulative cultural evolution occurred in our socializers we 
have measured memeplex generation over time (see Fig. 2). 

The maximum and average memeplex generation in¬ 
creases over time. The average memeplex generation grows 
much slower but the standard deviation also grows over 
time. The minimum memeplex is almost always zero be¬ 
cause there are usually newborns in the world that have not 
yet learned the shared memeplex of the population. A rare 
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Figure 3: Wasted energy in genes in the genome and memes 
in the selected memeplex are measured. We show the aver¬ 
age wasted energy in the population over time. 


Figure 4: Proportion of the selected memeplex devoted to 
non-gathering actions averaged over all members of the pop¬ 
ulation. 


cases when this does not occur is during a colony collapse 
in which there are no newborns (see below). 

This is evidence that cumulative cultural evolution is oc¬ 
curring (Marriott and Chebib, 2016a). This increase over 
time implies that memeplexes are improved in one genera¬ 
tion, shared among the members and passed to the next gen¬ 
eration. Improvements made early are preserved in the pop¬ 
ulation from generation to generation. We have tracked this 
optimization over time as well. 

In Fig. 3 the divergences between gene optimization and 
meme optimization shows that both breeders and socializers 
can optimize their behavior. For breeders this can only occur 
by optimizing the local parts of their genome that are copied 
into the memome and eventually used by the agent. For so¬ 
cializers this means optimizing the shared memeplexes that 
are passed between agents. 

The data show are averages over the population. New 
bom socializers have not had a chance to leam the shared 
memeplexes and thus have memeplexes similar to the breed¬ 
ers that don’t optimize. The most optimized memeplexes in 
the population have 0 wasted energy after about day 500. 
Since this is the most optimized the memeplex can get this 
slows the cumulative evolution. The only role of the social 
learning after this optimization is to maintain the highly op¬ 
timized memeplex (or one of its clones) from generation to 
generation. 

Another means of optimizing memeplexes is to eliminate 
time spent on the non-gathering activities: breeding, learn¬ 
ing, and socializing. We see (Fig. 4) that breeders evolve to 
spend more time engaging in these actions over time as se¬ 
lection pressure against this is weak. Cultural evolution oc¬ 
curs much quicker and so the weak pressure becomes much 
stronger over the same time. 


We see that the socializers optimize to spend much less 
time on these actions over the first 1000 days. The most op¬ 
timized memeplex would spend no energy breeding, learn¬ 
ing or socializing. This is rare though it occurs. A meme¬ 
plex with no breeding component is quite common and sup¬ 
presses reproduction in the agent if selected repeatedly (see 
below). A memeplex with no learning component is not a 
big detriment since the memeplex is likely already very op¬ 
timized. A memeplex without a social learning component 
makes it impossible to spread itself. While more optimal 
these are rare since they die with the host agent. As a result 
most optimized memeplexes spend no energy or a very small 
amount of energy on breeding and usually a little more en¬ 
ergy on learning and socializing. The success of the meme¬ 
plex depends on its ability to optimize and spread itself. 

Herding Although we have not conducted a quantitative 
analysis of herding in our agents, we can analyze the un¬ 
derlying mechanisms supporting the herds we observe. We 
know from prior work that the herds of the control group 
are maintained by common descent (Marriott and Chebib, 
2015b). Herds in the socializers are also maintained through 
the common decent of the memeplexes shared among mem¬ 
bers of the herd. Herds observed in socializers, even if out¬ 
wardly similar, are being maintained by social mechanisms. 

Assortative Mating Assortment of genetically related 
parents are not significantly affected by the presence of 
learning and social learning (see Fig. 5). There are some 
small differences in assortment of phenotypically related 
parents though it is not clear how to interpret this slight vari¬ 
ation (see Fig. 6). 
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Figure 5: Genetic difference between parents. This data is 
plotted on a log scale with base 2 and is averaged over the 
130 runs. 



Figure 7: The number of children each agent had. This data 
is plotted on a log scale with base 2. This data is cumulative 
over all 130 runs. 
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Figure 6: Phenotypic difference between parents. This data 
is plotted on a log scale with base 2 and is averaged over the 
130 runs. 

Philopatry When an agent would die we would record it 
in one of four categories. If the agent died childless then 
it could not be evaluated for natal philopatry. If the agent 
had children there were three possibilities: they only bred 
at their birth site, they sometimes bred at their birth site, or 
they never bred at their birth site. 

Breeders had an average of 45.6% childless agents while 
socializers had an average of 63.6% childless agents. Of 
the agents that had children, there were 59.9% of breeder 
agents with children that never bred at their birth site. In 
the socializers there were 49.7% of agents with children that 
never bred at their birth site. Fewer agents are breeding in 
the socializers but more of them are breeding at their birth 


site. 52.8% of breeders that bred at their birth site at least 
once did not breed elsewhere during their life, whereas in 
socializers 75.3% bred only at their birth sites. 

While socializers were more likely to die childless they 
appear to engage in more natal philopatry than the breeders. 
They were more likely to breed at their birth site and they 
were more likely to breed exclusively at their birth site. 

Eusociality Recall the conditions of eusociality among 
animals. Agents must have multi-generational communal 
cohabitation, mutual care for the young, and reproductive 
division of labor (and sometimes natal philopatry). 

Both breeders and socializers have multi-generational 
communal cohabitation. We know both display natal 
philopatry though it is stronger in socializers. So we must 
evaluate whether our agents have reproductive division of 
labor and mutual care for the young. 

At death we recorded the number of children the agent 
had during its life. We have plotted this data showing how 
many agents died with n children (see Fig. 7). This plot is 
on a logarithmic scale and shows an exponential drop off as 
number of children increases. Socializers have a shallower 
decrease as number of children increase. No breeders had 
more than twenty children. Among socializers some agents, 
though rare, have more than ninety children. Combining evi¬ 
dence from above with this we see that both fewer agents are 
engaging in sexual reproduction and those that do are repro¬ 
ducing more. This apparently meets the criteria of division 
of sexual reproductive labor. 

We also gathered data on the age of the eldest agent in 
the population. We notice that among the socializers there 
are older agents than among the breeders. To have many 
children an agent must live long enough to birth each child 
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and gather the energy required for this activity. This would 
require an old agent. It would also require an optimized 
memeplex that spent time breeding every day. If this agent 
also spread this memeplex to others the efficient breeding 
culture can be introduced and maintained in the population. 

Agents that do not spend energy breeding can save a lot 
of energy, which can extend their lifespan. Recall that the 
most optimized memeplexes spend no time breeding so all 
energy can be stored for a long productive childless life. If 
this agent also spreads this memeplex to others the child¬ 
less culture can be introduced and maintained in the popu¬ 
lation. If the memeplex spread to the whole population then 
the population will die out (see below). 

Among the optimized memeplexes the dominant kinds are 
those with breeding and those without breeding. If spread 
each will generate a different kind of culture. In our pop¬ 
ulations these optimized memeplexes occupy agents in the 
same population. Not all agents have learned one of these 
dominant cultures and engage in sub-optimial culture. This 
is common of younger agents. These different cultures com¬ 
pete for the participation of agents. 

We treat the sharing of optimal memeplexes to the young 
as a type of brood care in our simulation as we have not 
realized a brood care mechanism in our agents. Using this 
as a tool to evaluate brood care in our agents we can assess 
the whether there is mutual care for the young. 

Agents in our model do not discriminate when social 
learning. They can’t recognize their parents or young. 
Though there is a higher probability of two related agents 
occupying the same locations in the environment. Social 
learning is also bidirectional. Both agents act as teachers 
and learners. However, usually only the less experienced 
agent benefits from this exchange. 

Now consider an agent with an optimized memeplex that 
has a breeding component. After some time this agent pro¬ 
duces an offspring. The newborn has a sub-optimal meme¬ 
plex and will likely travel a different path than the parent. If 
the offspring is lucky in the next few days the agent will 
encounter its parent and engage in social learning. Then 
the offspring may become an optimal breeder as well. This 
lucky agent provided direct parent-child brood care. This is 
not the only possibility. 

The offspring occupies a population with others. Some 
may be older siblings, cousins, uncles, etc. while others are 
more distantly related. When the offspring first gets a chance 
to socially learn it might learn from one of these other agents 
(remember they don’t discriminate). This results in the ex¬ 
change of sub-optimal memeplexes but possibly also the en- 
culturation of some sub-optimal culture that brings the off¬ 
spring away from its parent. We might consider this a kind 
of mutual brood care. 

Finally, let’s imagine one of the others in the population 
has an optimal memeplex that avoids breeding. This agent 
cannot reproduce and create its own brood. It can only 


spread its memeplexes to the offspring of agents that breed. 
Thus, the existence of this culture relies on the care of other 
agent’s offspring. We have strong evidence that these cul¬ 
tures do indeed exist (see below). 

Together this evidence suggests that our socializers have 
emerged a type of eusociality. The have multi-generational 
cohabitation with mutual care for the young. Like humans 
they have an interesting culture based division of reproduc¬ 
tive labor. The also engage in natal philopatry more often 
than the breeders. 

Colony Collapse When a highly optimized memeplex has 
no breeding component but does have social learning it can 
spread into the population as discussed above. The danger 
of this culture is that if every agent in the population follows 
it then the population will die out. 

This cultural suppression can have catastrophic conse¬ 
quences. A typical run will begin with a handful small 
colonies of agents in different parts of the random geomet¬ 
ric network. When a culture of not breeding emerges and 
spreads to every agent in one of these colonies the popula¬ 
tion dies out. In 21 out of 100 socializer runs this led to 
every agent in the simulation dying before round 5000. This 
never occurs in breeder or learner runs. Inspection of the 
memeplexes of agents during a collapse confirms that there 
are no breeding components and most agents have a large 
store of energy and a long lifespan. 

Conclusion 

We were curious how social learning would affect strategies 
of sexual reproduction in our simulated agents. We did not 
see significant differences between the assortative behavior 
of breeders and socializers. We interpret this result as ad¬ 
ditional evidence that assortment can and probably is main¬ 
tained in most populations by non-social forces. 

Social learning also appeared to enforce a higher rate of 
natal philopatry. While fewer social learning agents bred, 
more of them bred at their birth sites and more of them bred 
exclusively at their birth site. This suggests that sociality 
and natal philopatry may correlate in natural populations. 

Finally after adding social learning to our agents we find 
that they evolve a culture of eusocial reproduction. Repro¬ 
ductive labor is more concentrated both in a sub-population 
and occasionally within agents that breed considerably more 
than others in the population. 

All agents that engage in social learning can be consid¬ 
ered to engage in brood care by sharing culturally learned 
information to others. As they don’t discriminate when so¬ 
cial learning and there is multi-generational cohabitation this 
brood care occurs between agents of different generations 
and of different relatedness. 

These are the criteria for eusociality applied to animals 
and in applying these criteria to our agents we can see there 
is evidence to call them eusocial. It is interesting to us that 
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our relatively simple social exchange mechanism is strong 
enough to evolve a eusocial culture in our agents. It is note¬ 
worthy that this eusociality is maintained by cultural forces 
not genetic forces. That is, whether our social agents breed 
or not is not dependent upon their genetics, but rather on 
their learned culture. 
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Abstract 

Autonomous task allocation is a desirable feature of robot 
swarms that collect and deliver items in scenarios where con¬ 
gestion, caused by accumulated items or robots, can tem¬ 
porarily interfere with swarm behaviour. In such settings, 
self-regulation of workforce can prevent unnecessary energy 
consumption. We explore two types of self-regulation: non¬ 
social , where robots become idle upon experiencing conges¬ 
tion, and social , where robots broadcast information about 
congestion to their team mates in order to socially inhibit for¬ 
aging. We show that while both types of self-regulation can 
lead to improved energy efficiency and increase the amount 
of resource collected, the speed with which information about 
congestion flows through a swarm affects the scalability of 
these algorithms. 

Introduction 

Congestion is an important factor that can negatively af¬ 
fect the performance of robot swarms (Hoff et al., 2010). 
Today’s robotic systems, utilised in automated warehouses 
(D’Andrea, 2012), agriculture (Cartade et al., 2012), or in 
hospitals (Thiel et al., 2009), must maintain effective work 
schedules for individual robots in order to minimise inter¬ 
ference between robots and save energy. Decentralised task 
allocation, which affords redundancy and scalability, has 
been proposed as a possible solution (Wawerla and Vaughan, 
2010; D’Andrea, 2012) that is likely to become more im¬ 
portant in the near future as autonomous robot swarms will 
increasingly be deployed in unstructured and dynamic envi¬ 
ronments. In this paper, we explore the problem faced by 
a robot swarm that collects items from the environment and 
can cope with congestion by regulating its workforce in a 
decentralised manner in order to save energy. Furthermore, 
we explore how the means by which information about con¬ 
gestion is obtained by the robots affects the scalability of the 
swarm’s performance under various foraging conditions. 

During foraging, congestion can either result from the 
size of the robot population or from the structure of the envi¬ 
ronment. For example, robots might be required to wait until 
an occupied resource drop-off location becomes accessible 
(Wawerla and Vaughan, 2010) or they might need to queue 


in order to leave a crowded drop off location to perform 
more work (Krieger and Billeter, 2000). An autonomous 
robot swarm should be able to sense when congestion has 
become a problem and adjust its workforce accordingly. 

We simulate robot swarms that collect items from the en¬ 
vironment and drop them off in a central base, from where 
the items are consumed at a given rate. We explore two types 
of workforce self-regulation: non-social, where robots be¬ 
come idle upon directly experiencing severe congestion, and 
social , where a robot will inhibit the foraging of its team 
mates by signalling them to become idle when the conges¬ 
tion that it experiences is severe. We show that both types 
of self-regulation can lead to significant energy savings and 
thus to a greater number of items collected when the en¬ 
ergy available to the robots is limited. More importantly, 
we demonstrate that the speed with which information about 
congestion flows through a swarm, either when robots de¬ 
tect congestion themselves or when they exchange informa¬ 
tion with their team mates, affects the swarm’s ability to re¬ 
spond to it appropriately. While social self-regulation re¬ 
sults in rapid information flow and can lead to significant 
performance benefits in certain scenarios, it can also lead 
to significantly worse performance in others. On the other 
hand, non-social self-regulation, where information flow is 
slower, leads to improved energy efficiency across a greater 
number of foraging conditions, making it more suitable in 
unknown environments, although it is outperformed by so¬ 
cial self-regulation in some cases. 

The following sections provide an overview of related 
work and a description of our simulation and analysis meth¬ 
ods. We then compare, across a number of experimental sce¬ 
narios, the performance of our two types of self-regulated 
swarms with that of control swarms that do not use self¬ 
regulation. We evaluate both the amount of energy needed to 
collect items and the number of items collected when robot 
energy is limited. We conclude with a discussion of how 
our results relate to our previous work on information flow 
in swarms (Pitonakova et al., 2016) and provide examples of 
real-world applications where the two types of self-regulated 
swarms could be used. 
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Related Work 


Route planning systems that optimise robot traffic are of¬ 
ten used in controlled warehouse environments with small 
robot teams (Vivaldini et al., 2010; Mather and Hsieh, 2012). 
However, such approaches require a centralised controller 
to guide robot behaviour and are thus unsuitable for large 
swarms, where computation of optimal solutions becomes 
infeasible (Dahl et al., 2009). Furthermore, a model of the 
environment and of the tasks within it, which centralised 
planning systems rely on, can be difficult to obtain in dy¬ 
namic or unstructured environments. On the other hand, de¬ 
centralised decision making, where robots change their be¬ 
haviour based on limited local information obtained through 
their sensors, is more suitable for complex tasks of this type 
(Hoff etal., 2010). 

Decentralised robot decision making has been well stud¬ 
ied in a number of logistic and foraging applications. For ex¬ 
ample, unmanned vehicles that need to transport items from 
one location to another can adjust their work time and decide 
to recruit others based on the number of items in pick-up lo¬ 
cations (Wawerla and Vaughan, 2010). In behaviour-based 
robotics, a combination of environmental cues, such as the 
presence of items or other robots nearby, can trigger or in¬ 
hibit foraging behaviour, leading to self-organised division 
of labour between robots that are idle and those that collect 
resources (Jones and Mataric, 2003). Alternatively, ‘bucket 
brigading’ robots can form chains of work areas and pro¬ 
gressively transport items between two locations (Shell and 
Mataric, 2006; Pini et al., 2013) and even adapt the size of 
the work areas based on collisions with other robots in order 
to improve their performance (Lein and Vaughan, 2009). 

The Response Threshold Model (RTM) is a self- 
regulatory mechanism inspired by social insects (Bonabeau 
et al., 1997) that has been applied in a number of simulated 
and real-world robot experiments. According to the model, 
robots alternate between foraging and resting based on some 
internal, environmental or social cues in order to optimise 
their energy consumption. For example, robots can count 
the number of items stored in the base and only leave to for¬ 
age when the number is below a specified threshold (Yang 
et al., 2009). Robots can also evaluate how many items they 
encountered during foraging and decide to rest if the envi¬ 
ronment is not rich enough (Labella et al., 2006). By count¬ 
ing the number of collisions with other robots (Liu et al., 
2007) or by detecting drops in their own expected perfor¬ 
mance (Dahl et al., 2009), robots can decide to rest if they 
estimate that congestion is beyond an acceptable level. Fi¬ 
nally, in dynamic environments, where the number of items 
in the environment changes over time, robots can decrease 
the energy cost of collecting items by only foraging when 
enough items are available, estimating the state of the envi¬ 
ronment in either a centralised (Liu et al., 2007) or decen¬ 
tralised (Dai, 2009) manner. 

Our work builds on the Response Threshold Model liter- 



Figure 1: ARGoS simulation screenshot of the base and a 
deposit Dm away from the base edge. The base consists of 
a resting bay, an observation bay and an unloading bay. A 
light source is placed above the centre of the base to guide 
robot navigation. Pellets collected by the robots temporarily 
accumulate in the unloading bay, causing congestion. 


ature, and in particular on the work of (Liu et al., 2007) and 
(Dai, 2009), where robots estimated the level of congestion 
in order to prevent unnecessary energy consumption. How¬ 
ever, we apply the RTM in a novel scenario, where success¬ 
ful foragers recruit other robots to the worksites that they 
are exploiting (see also Pitonakova et al., 2014, 2016). 

Furthermore, we provide novel insights into the role 
played by information flow in decentralised congestion es¬ 
timation. In our non-social model, congestion is estimated 
by each robot individually, while in the social model, robots 
communicate their estimates to nearby robots in order to so¬ 
cially inhibit foraging. 

Our approach to congestion estimation is inspired by the 
self-regulatory behaviour of honey bees foraging for nectar 
(Anderson and Ratnieks, 1999; Gregson et al., 2003). When 
nectar is abundant, foragers may bring more nectar into the 
hive than the nectar-receiving bees can cope with. In order to 
prevent unnecessary foraging, individual foraging bees eval¬ 
uate how long it takes for their nectar to be unloaded. If 
unloading is taking too long, a forager will tremble dance 
around the nest, inhibiting other bees from recruiting and 
thus reducing the number of foragers. Our social RTM uses 
a similar principle. 
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Methods 

Environment 

All our experiments are performed in the ARGoS simula¬ 
tion environment which implements realistic 3D physics and 
robot models (Pinciroli et al., 2012). The simulation has 
continuous space and updates 10 times per simulated sec¬ 
ond. A circular base with a diameter of 3 metres is situated 
in the centre of the experimental arena. The base is divided 
into three sections (Figure 1): an interior circular resting bay 
with an annular observation bay around it and an annular 
unloading bay around that. A light source, placed above the 
centre of the base, is used by robots as a reference for nav¬ 
igation towards and away from the base centre (as in, e.g., 
Krieger and Billeter, 2000; Pini et al., 2013). 

Cylindrical resource deposits with radius vp are placed 
outside of the base, each containing an unlimited volume of 
resource. In order to enable robots close to a deposit to move 
towards it, a colour gradient with radius rc is present on the 
floor around each deposit. 

We explore two types of scenarios: 

• HeapN: N < 4 deposits distributed evenly around the 
base at a distance D = {5, 7,9}m from the base edge. 
These deposits represent large heaps of resource (e.g., 
mineral deposits), with r^> = 0.5m and rc = 3m. 

• ScatterN: N > 10 deposits randomly distributed between 
D — 5m and D + 5m from the base edge. These deposits 
are small (e.g., litter on a street), with vp = 0.1m and 
rc = lm. 

Robots 

The simulated MarXbots (Bonani et al., 2010) are circu¬ 
lar, differentially steered robots with a diameter of 0.17m 
that can reach a maximum speed of 5cm/s in our simula¬ 
tion. The robots use infra-red sensors for obstacle avoidance 
and communication, colour sensors for navigation towards 
nearby deposits, and a light sensor for phototaxis towards 
the base (see Pitonakova et al., 2016, for more details). The 
robots are modelled as finite-state machines and can imple¬ 
ment three types of homogeneous swarm: control swarm , 
non-social self-regulators, and social self-regulators (Fig¬ 
ure 2). 

Control swarm robots exhibit basic foraging behaviour 
with no self-regulation. A robot starts with a random ori¬ 
entation and a random position in the observation bay as an 
observer , ready to receive and follow recruitment signals. 
An observer moves randomly across the observation bay 
and avoids traveling into the unloading and resting bays. At 
each time step an observer can become a scout with scouting 
probability p(S ) = 10 -3 . A scout leaves the base and uses 
Levy movement (Reynolds and Rhodes, 2009) to search for 
a resource deposit within 20m of the base. The robot up¬ 
dates its estimated location relative to the base using path 



(al 



I*) 


Figure 2: Finite state machine representation of the robot 
controller in swarms with (a) non-social self-regulation, and 
(b) social self-regulation. The behaviour of the control 
swarm controller is displayed in dashed boxes. 

integration based on odometry at each time step (e.g., Lem- 
mens et al., 2008; Gutierrez et al., 2010). When a deposit 
is found, the robot loads one unit of volume of resource and 
returns back to the base utilising phototaxis, while keeping 
track of its position relative to the deposit using odometry. 
Odometry noise is not modelled. Any scout that cannot find 
a deposit within 600s returns to the base and becomes an 
observer. 

A laden robot returning to the base drops off its load in the 
unloading bay in the form of four pellets of size 0.1m 3 . The 
robots cannot push existing pellets around and thus have to 
avoid them in order to traverse the unloading bay. A new 
pellet can only be deposited when there is enough free space 
in front of the robot. Deposited pellets disappear from the 
simulation (representing their utilisation by a hypothetical 
unmodelled system of robots or human users) after a period 
of unloading bay handling time , £#. When tn = Is, pellets 
disappear very quickly and do not cause congestion. By in¬ 
creasing the value of t#, we can experiment with the level 
of congestion in the simulation, as more accumulated pellets 
make entering and leaving the base more difficult. 

After depositing the pellets, the robot moves further into 
the base and performs recruitment for 120s, randomly mov- 
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ing across the base while avoiding re-entering the unloading 
bay. A recruiter advertises the fact that it has information 
about a deposit to all observers located within recruitment 
range of 0.6m. Deposit location is communicated to each 
observer in a one-to-one fashion by taking into account the 
local axes of the robots and their alignment relative to each 
other (Gutierrez et al., 2010). The recruiter resumes forag¬ 
ing from the same deposit after it completes recruitment. 

In self-regulated swarms , robots additionally measure 
their pellet unloading time , tjj , i.e., the time between en¬ 
tering the base and leaving the unloading bay. Robots in 
swarms with non-social self-regulation (Figure 2a) proceed 
to the resting bay and become idle after depositing pellets 
if congestion is severe (i.e., tjj > 60s). An idle robot con¬ 
sumes a negligible amount of energy (as in, e.g., Wawerla 
and Vaughan, 2010) and can be woken up and immediately 
recruited by a recruiter, i.e., by a robot that does not experi¬ 
ence severe congestion. In order to avoid deadlocks, an idle 
robot can also wake up spontaneously with a waking proba¬ 
bility p(W) = 10 -4 . 

Robots in swarms with social self-regulation (Figure 2b) 
do not become idle after experiencing severe congestion 
(i.e., when tjj > 80s), but become tremble dancers instead. 
A tremble dancer travels randomly across the observation 
bay and broadcasts stop signals to all robots within a 0.9m 
range for 120s, after which it leaves the base to resume for¬ 
aging without recruiting. Stop signals inhibit foraging as 
any observer or forager that receives a stop signal moves to 
the resting bay and becomes idle. Stop signals also inhibit 
recruitment, as they cause any recruiter in range to cease re¬ 
cruiting and immediately leave the base to forage. 

Analysis 

We performed 50 simulation runs that lasted 4 simulated 
hours in Heapl, Heap4, ScatterlO and Scatter25 scenarios 
and compared the performance of all three swarm types us¬ 
ing Nr = 25 and Nr = 50 robots. Since we are interested 
in efficient energy usage, we define a performance metric, 
energy efficiency , C , which represents the amount of energy 
a swarm spends in order to collect a unit of resource: 


where R is the total amount of resource collected by the 
swarm and E is the total amount of energy expended by 
the swarm. It is assumed that an idle robot expends 0 
units of energy per second and a robot in any other state 
expends 1 unit of energy per second. Since the control 
swarm robots are never idle, control swarms spend a total 
of Nr x (4 x 60 x 60) = Nr x 14,400 units of energy 
in each 4-hour experiment. We compare C values achieved 
by the two types of self-regulated swarms with that achieved 
by the control swarms in order to find out how advantageous 
self-regulation was in different scenarios. 


We also analyse how much resource the swarms collected 
when energy availability was limited. During this analysis, 
it is assumed that all robots stop working when the swarm 
spends Nr x E' units of energy, where E r is the energy 
limit per robot. Energy limits may play a role for example in 
planet exploration, where robots might use a common solar- 
powered energy repository of a limited capacity. 

Simulation Results 

In the following sections, we compare the control swarms to 
each of the two kinds of the self-regulated swarms in terms 
of their energy efficiency, C, and the amount of resource 
they collected, R. We show that the self-regulated swarms 
can achieve better C in scenarios where pellets cause sig¬ 
nificant congestion. Furthermore, self-regulation leads to a 
higher amount of resource collected when the total energy 
available to the swarms is limited in such scenarios. We also 
discuss cases when self-regulation leads to performance de¬ 
terioration, especially when social self-regulation is used. 

Energy efficiency 

In this section we report the performance (in terms of energy 
efficiency) of different swarm types in each of 48 scenarios: 
2 swarm sizes (25 and 50) x 2 unloading bay handling times 
(5s and 20s) x 3 deposit distances (5m, 7m , and 9m) x 
4 deposit distribution types (Heapl, Heap4, ScatterlO and 
Scatter25). In each case, we report the average performance 
of 50 self-regulated swarms relative to the average perfor¬ 
mance of 50 control swarms in the same scenario. 

First we will summarise the performance of the control 
swarms, depicted in Figure 3. Their resource collection per¬ 
formance was more attenuated by congestion when the num¬ 
ber of robots was large (Nr = 50) and when unloaded 
pellets did not disappear quickly from the unloading bay 
(tn = 20s). Congestion was especially problematic in sce¬ 
narios with a large number of deposits and when deposits 
were closer to the base. More severe performance deteriora- 
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Figure 3: Resource collection performance of control 
swarms relative to experiments with no congestion (i.e., 
when tn = Is). 
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Figure 4: Performance of non-social self-regulated swarms 
relative to control swarms. 
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Figure 5: Performance of social self-regulated swarms rela¬ 
tive to control swarms. 


tion was measured in the Scatter scenarios, where multiple 
foraging locations were exploited at the same time, causing 
fast pellet accumulation around the whole unloading bay. 
Evaluating the performance of non-social self-regulated 


swarms relative to that of control swarms (Figure 4) indi¬ 
cates that non-social self-regulators tended to enjoy more of 
an advantage when control swarms were more affected by 
congestion. Consequently, where congestion was very mild 
(e.g., a small number of robots foraging for a few heaped de¬ 
posits distributed far from a base that handles unloaded de¬ 
posits quickly), the performance of non-social swarms and 
control swarms was very similar (« ±1% difference), and 
control swarms even enjoyed a 5% advantage in the mildest 
Scatter 10 scenario. However, where congestion tended to 
be severe (e.g., a large number of robots foraging for many 
scattered deposits distributed near to a base that handles 
unloaded deposits slowly), the performance of non-social 
swarms was considerably greater than that of control swarms 
(up to « +40% in the most extreme Scatter25 environ¬ 
ments). 

The performance of social self-regulatory swarms rela¬ 
tive to the control swarms follows a similar but more com¬ 
plicated pattern (Figure 5). Again, where congestion tended 
to be severe, the performance of social swarms was better 
than that of control swarms (up to « +67% in the most ex¬ 
treme Scatter25 environments). Moreover, in these scenar¬ 
ios, social swarms did even better than non-social swarms, 
achieving an advantage over the control swarms that was of¬ 
ten between 20% and 40% larger than that achieved by non¬ 
social swarms. Conversely, in scenarios where congestion 
was very mild, the performance of social swarms was worse 
than that of control swarms and non-social swarms by as 
much as —14%. 

In general, there were two factors that affected the ad¬ 
vantage of self-regulation: the amount of congestion in the 
base and the distribution of deposits in the environment. For 
instance, self-regulation was most advantageous in Scatter 
scenarios when deposits were close to the base (i.e., when 
the control swarms experienced high congestion due to short 
trips between the base and the deposits), and, more impor¬ 
tantly, when foraging effort could be refocussed in a new 
direction once a particular part of the unloading bay became 
congested. On the other hand, self-regulation was not as ef¬ 
fective in the Heapl scenarios, where all resources were con¬ 
centrated in a single location. Robots in the self-regulated 
swarms could still become idle when pellets accumulated, 
but recruitment could only take place again when the forag¬ 
ing robots measured a low unloading time, i.e., when enough 
of the pellets that had been unloaded in the part of the un¬ 
loading bay between the deposit heap and the resting bay 
had disappeared. This was a particular problem for the 
swarms with social self-regulation, where the information 
about congestion spread quickly through the swarm, causing 
a majority of the robots to become idle. Unlike in non-social 
swarms, the number of foraging robots was often very low 
and it took the social swarms a long time to recover from 
inactivity. 
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Figure 6: Resource collection performance of self-regulated 
swarms relative to control swarms under unlimited and lim¬ 
ited energy conditions for scenarios with (a, c) mild conges¬ 
tion, and (b, d) severe congestion. 


Resource collection 

In this section we report the average performance (now in 
terms of total amount of resource collected) of the self- 
regulated swarms relative to the average performance of the 
control swarms. As in the previous section, we consider 
cases with mild congestion (Nr = 25, tn = 5s) and severe 
congestion (Nr = 50, tn = 20s). We first consider exper¬ 
iments where the total energy available to the swarms was 
unlimited. We then report on experiments with energy limi¬ 
tation, where robots ceased foraging as soon as their swarm 
had consumed Nr x E' units of energy, where E' was the 
energy limit per robot. Figure 6 depicts our results. 

Although both types of self-regulated swarms were often 
more energy efficient than control swarms, when energy was 
unlimited they did not tend to collect more resource than the 
control swarms. Non-social self-regulators tended to collect 
a similar quantity to control swarms, whereas social self¬ 
regulators collected less resource than control swarms when 
congestion was mild (Figure 6a, c) 

When swarm energy was limited, non-social self¬ 
regulators tended to either collect significantly more re¬ 
source than control swarms (when congestion was severe), 
or roughly the same amount as control swarms (when con¬ 
gestions was mild). For instance, when E' was set to 8000 
and when a large number of robots foraged from a base that 


handled unloaded deposits slowly in the Scatter25 scenario, 
non-social swarms foraged up to 30% more resource rela¬ 
tive to control swarms (Figure 6b). In experiments where 
congestion was mild, the advantage of non-social swarms 
was less pronounced. For instance, when a small number of 
robots foraged from a base that handled unloaded deposits 
quickly, non-social self-regulated swarms only collected up 
to 10% more resource than control swarms (Figure 6a). 

Social self-regulation again led to more extreme variation 
in performance when the swarm energy was limited. When 
congestion was severe, social self-regulators tended to col¬ 
lect significantly more resource than either control swarms 
or non-social self-regulators (Figure 6b and 6d). Whereas 
when congestion was mild, they collected roughly the same 
amount as control swarms and social self-regulators (Fig¬ 
ure 6a and 6c). For instance, in Scatter25, social-self- 
regulators collected approximately 55% more resource than 
the control swarms when E' = 8000 (Figure 6b). On the 
other hand, in Heapl, where the robots could not spread 
their foraging effort to other directions once a particular 
part of the unloading bay became congested, social self¬ 
regulators collected on average 10% less resource than the 
control swarms when congestion was mild (Figure 6c). In 
both cases, variation in performance within a scenario was 
higher for social swarms. 

When the value of E' was higher or lower than 8000, 
the relative performance of both self-regulated swarms de¬ 
creased linearly but was never lower than when the swarm 
energy was unlimited. 

Discussion and Conclusions 

We have shown that swarms can regulate their foraging ac¬ 
tivity effectively on the basis of locally perceived levels of 
congestion. The solution presented in this paper extends 
previous studies of the Response Threshold Model (RTM) 
(e.g., Liu et al., 2007; Dahl et al., 2009; Yang et al., 2009), 
applying it for the first time to foraging swarms that use re¬ 
cruitment and investigating the effect of information sharing 
during decentralised congestion estimation. 

We compared three types of swarms: control swarms with 
no self-regulation, swarms with non-social self-regulation 
(where robots become idle when they directly sense severe 
congestion), and swarms with social self-regulation (where 
robots instruct their team mates to become idle when they 
detect severe congestion). The swarms were assessed across 
a number of experimental scenarios, where we varied the 
number of deposits (Nr), deposit distance from the base 
( D ), the number of robots (Nr), and the time it took for ac¬ 
cumulated material to be consumed at the base (tn)- We 
evaluated the performance of the swarms in terms of en¬ 
ergy efficiency, C, and showed that C can be improved 
through self-regulation especially in environments where the 
collected material accumulates in the base quickly (because 
D is small, or Nr or tn are large) or where the swarms 
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can exploit multiple foraging directions simultaneously (i.e., 
because Nd is large). Additionally, we demonstrated that 
self-regulated swarms collect more resources than control 
swarms when the energy supply available to the robots is 
limited. 

There were notable differences in how swarms with non¬ 
social and social self-regulation performed in the various ex¬ 
perimental scenarios. By comparison with control swarm 
behaviour, non-social self-regulation led to mediocre perfor¬ 
mance improvements or equivalent levels of performance in 
some scenarios. On the other hand, social self-regulation 
achieved large improvements over control swarms in sce¬ 
narios where pellets accumulated quickly, but were also out¬ 
performed by control swarms in scenarios where congestion 
was not as severe or where all resources were concentrated 
in a single location. In our work on information flow in for¬ 
aging swarms that use recruitment (Pitonakova et al., 2016), 
we argued that fast information flow can lead to patholog¬ 
ical states of a whole swarm that prevent the swarm from 
responding to changes in the environment. Furthermore, we 
demonstrated that while swarms with fast information flow 
tend to perform extremely well in a limited number of envi¬ 
ronments but perform poorly in others, swarms with slow in¬ 
formation flow tend to perform well across a broad spectrum 
of scenarios. In this paper, we extend this argument to sce¬ 
narios involving congestion. Information flow was slower in 
swarms of non-social self-regulators which relied on their 
own local perception alone, and it was faster in swarms 
of social self-regulators which communicated information 
about congestion to one another. As was the case in (Piton¬ 
akova et al., 2016), slow information flow led to behaviour 
suitable for a larger number of experimental scenarios, while 
fast information flow caused more extreme variation in per¬ 
formance meaning it was only appropriate in a restricted set 
of scenarios. 

Consequently, robots inspired by our social self-regulated 
swarms could be applied effectively in appropriate well- 
defined foraging or logistic tasks, for example to deliver 
items between various locations in warehouses and hospi¬ 
tals, or to collect crops. In these scenarios, the relevant task 
parameters (swarm size, processing time of collected items, 
etc.) are known upfront. However, if we were to employ 
robot swarms in an unknown or more variable environment, 
e.g., work sites on different planets or underwater, we would 
need to take into account the fact that while fast informa¬ 
tion flow can lead to beneficially fast response times, it can 
also cause significantly suboptimal performance under cer¬ 
tain conditions. In such applications, self-regulation that is 
more subtle and occurs in a more localised fashion would 
be more suitable, not because of the ability of the swarms 
to perform work faster or more efficiently, but because such 
collective behaviour is more scalable. It might also be ad¬ 
vantageous to create an adaptive algorithm, where robots al¬ 
ter their own self-regulatory behaviours (for example their 


willingness to exchange information with others, their wak¬ 
ing up probability, etc.), in order to achieve a level of infor¬ 
mation flow within the swarm that varies dynamically in a 
way that is appropriate to the swarm’s current environment. 
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Abstract 

This paper presents an investigation into a population of 
robots that evolves through embodied evolution — an evo¬ 
lutionary process that is not centrally controlled, but emerges 
from robot interactions just as natural evolution does. The 
robots select their partners randomly, without reference to 
any assessment of task performance, but the environment is 
biased to promote task behaviour by awarding additional life¬ 
time to robots that pick up pucks. The experiments show that 
the robots do learn to pick up pucks in such a setting. Con¬ 
trary to what one might expect, increasing the amount of ad¬ 
ditional lifetime awarded decreases task performance for all 
settings considered. Closer analysis shows that this decrease 
is in part due to the fact that the increased lifespan decreases 
the number of opportunities to spread a robot’s genome, but 
that increasing the award level also negatively affects selec¬ 
tion pressure when there is opportunity for robots to spread 
their genome. We conclude that higher rewards overly em¬ 
phasise one aspect of robot behaviour and in doing so prevent 
evolution from exploring the behaviour space. 

Introduction 

Embodied evolution, or more generally on-line evolutionary 
robotics, subscribes to a view of collectives of robots that 
are released in uncharted, possibly changing, environments. 
The robots learn to operate in that environment, of which 
the particulars are unknown at design time, through evolu¬ 
tion (Watson et al., 2002; Haasdijk et al., 2014). Just as in 
natural evolution, the process is not centrally orchestrated 
(in contrast to most evolutionary computing research), but 
evolution emerges from local interactions between the indi¬ 
viduals: they survive to meet other individuals and between 
them decide to procreate, or not. For such systems to be 
of practical relevance, evolution must serve two purposes. 
Firstly, the robots must adapt to their environment so that 
they get the opportunity to procreate. The robots must, for 
instance, learn to move about to spread their genomes, or 
they must maintain their energy levels by regularly visiting 
charging stations. Secondly, the robots must perform some 
user-defined task: monitoring, patrolling, surveying, mining 
or harvesting are often considered in these kinds of scenarios 
(Bellingham and Rajan, 2007). 


Although the environment in which robots operate does 
not specify any crisply defined objective function, it does 
indirectly circumscribe goals for the population of robots 
to survive and evolve. This implies environmental selec¬ 
tion pressure that drives adaptation to the environment with¬ 
out reference to any user-defined task. There are two ways 
to augment this selection to also drive towards task perfor¬ 
mance: firstly, the robots can explicitly assess their task per¬ 
formance and add a second tier of selection {mate selection). 
Secondly, and more commonly in artificial life than in evo¬ 
lutionary robotics research, the environment can be modi¬ 
fied so that individuals with appropriate behaviour receive 
an environmental advantage that increases their chances of 
reproductive success, e.g. by increasing their lifespan. In 
the latter case, task performance is not explicitly selected 
for, but the natural selection process is biased to promote 
task performance. 

A well-known example of an evolutionary system which 
explicitly selects for task performance -and origin of the 
term embodied evolution- is that by Watson et al. (2002). 
Watson et al. added selection on the basis of an individ¬ 
ual’s prowess in a resource gathering task by varying the 
frequency at which robots would attempt to broadcast their 
genomes: better task performance increased this frequency. 
Thus, the task is explicitly defined and the robots assess their 
own task performance to drive selection. Haasdijk (2015) in¬ 
vestigated the interplay of natural, environmental selection 
and explicit task-based selection, finding that explicit selec¬ 
tion for task performance results in substantially higher se¬ 
lection pressure than that imposed indirectly by the environ¬ 
ment, causing robots to prefer environmentally detrimental 
behaviour if that improves their odds in explicit selection. 

This paper considers the alternative method of promoting 
task behaviour: the rules of the environment are modified 
so that robots that act appropriately benefit through the en¬ 
vironment improving their chance of reproductive success. 
A good example of such biased natural selection (Bredeche 
and Montanier, 2012) can be seen in the Avida system by 
Adami and Brown (1994). Here, the individuals are pro¬ 
grams in a virtual machine that can procreate by making 
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copies of themselves, just as in Ray’s seminal Tierra sys¬ 
tem (1991). Selection for task performance is then added 
by increasing the clock rate for individuals that perform par¬ 
ticular calculations well: this increases the speed at which 
they can generate copies and so bestows a reproductive ad¬ 
vantage. The famous poisonous food experiment by Todd 
and Miller (1990) provides another example. In this exper¬ 
iment, the task is to collect resources, and virtual agents 
that gather (‘eat’) the right type of resource increase their 
lifespan, while eating the wrong type of resource decreases 
lifetime. The difference in lifetime implies a difference in 
reproductive success: agents that live longer (i.e., agents 
that eat ‘healthy’ plants) have more opportunity to procre¬ 
ate. Similar approaches use ‘virtual energy’ as an indicator 
of task performance to then increase lifespan and so give re¬ 
productive advantage, e.g., Elfwing et al. (2005) and Weel 
etal. (2012). 

The experiments in this paper take a similar approach in 
a resource gathering scenario: simulated robots can collect 
pucks that extend their lifetime by certain fraction of their 
initial lifespan (the reward level). The resultant reproduc¬ 
tive benefits promote task behaviour and so induce selection 
pressure towards puck gathering behaviour. The research 
question we consider in this paper is how the size of the re¬ 
ward influences the course of evolution. 

Experimental Set Up 

The experiments in this paper are based on the MONEE 
experiments by Haasdijk (2015) (from which a substantial 
part of this section is taken), which in turn extend Bredeche 
et al.’s mEDEA system (2012). In mEDEA, there is no ob¬ 
jective to optimise: robots can exchange genetic material 
that encodes their controllers whenever they come within a 
certain maximum distance of each other (e.g., in range for 
infrared communication). A robot’s controller is active for a 
fixed amount of time and when this time expires, it randomly 
selects one of the received genomes and activates a mutated 
copy of this genome as the new active controller. Thus, robot 
controllers procreate by transmitting their genome to eggs, 
and the more eggs a robot inseminates, the more chances it 
has for procreation. Because the transmission of genomes is 
continuous and at close range, the more a robot moves about 
the arena, the better its chances of producing offspring. 

The robot controller lifecycle in our experiments consists 
of two phases: life and rebirth. The robot controllers have a 
limited, fixed, lifetime during which they perform their ac¬ 
tions; moving about, foraging, et cetera (this lifetime may be 
extended by picking up pucks as described below). When 
their lifetime ends, they enter a rebirth phase and become 
‘eggs’: stationary receptacles for genomes that are transmit¬ 
ted by passing live robots. The rebirth phase also lasts a 
fixed amount of time, and once this has passed, the egg ran¬ 
domly selects parents from the received genomes to create a 
new controller. The robot then reverts to the ‘life’ role with 
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Figure 1: Experiment screenshot. Robots are shown as small 
circles with sensor beams indicated. Pucks are shown as 
small green squares (the blue squares show a second puck 
type that is disregarded in the experiments in this paper). 
The shaded orange rectangles indicate arena walls and ob¬ 
stacles. 

this new controller. The resulting evolutionary process is es¬ 
sentially the same as that in mEDEA: the more ambulant a 
robot, the higher its reproductive success rate. In contrast 
to MONEE, there is no further selection criterion because 
the eggs select the parent genome randomly. In these exper¬ 
iments, the active phase lasted 2,000 time steps, and the egg 
phase 200 time steps. 

Environment and Control 

The experiments were conducted in a simple 2D simulator 
called RoboRobo (Bredeche et al., 2013), simulating 100 e- 
puck robots in an environment that contains obstacles and 
pucks. 1 The sides of the square arena are roughly 330 robot 
body lengths long (1,024 pixels in the simulator), and it con¬ 
tains a number of obstacles (see figure 1) and pucks. The 
pucks are spread throughout the arena, and they are imme¬ 
diately replaced in a random location when picked up. The 
robots move around the arena, spreading their genome as 
they encounter eggs and dying when their time has passed. 

Robots can collect pucks simply by driving over them; 
picking up a puck extends the robot’s lifetime by a certain 
amount. To detect pucks, the robots have 8 special sensors, 
laid out in the same manner as the standard e-puck infrared 

’Code for the experiments and analysis scripts is available from 

https://github.com/ci-group/monee.git. 


315 




sensors: 6 face forward, 2 face to the rear. Each robot is con¬ 
trolled by a single-layer feed forward neural network which 
controls its left and right wheels. The inputs for the neural 
network are the robot’s puck and obstacle sensors as well 
as two bias nodes (18 inputs in total). The robot’s genome 
directly encodes the neural network’s weights as an array of 
reals. The robots select a single parent from the received 
genomes and their current controller is discarded, so there is 
no crossover. Variation is applied by adding small gaussian 
perturbations (A f = 0, a = 0.1) to the connection weights. 

As mentioned, the robots alternate between periods of ac¬ 
tive puck gathering (life phase) and motionless genome re¬ 
ception (egg phase). The egg phase lasts 200 time steps, 
the life phase is initialised at 2,000 time steps, but to pre¬ 
vent synchronised cycles among the robots, we add a small 
random number to each robot’s initial lifetime. This desyn¬ 
chronises switching between life and rebirth even though the 
runs start with all robots in sync at the first time-step of their 
lifetime. 

Biased Natural Selection 

MONEE extends mEDEA by having the robots select ex¬ 
plicitly for task behaviour instead of selecting randomly 
from the received genomes. In the current set of experi¬ 
ments, however, this explicit selection is disabled. Instead, 
we provide reproductive advantage and so promote puck¬ 
gathering behaviour by rewarding robots for pucks they pick 
up: each puck yields an increase in lifetime. The amount of 
added lifetime is defined as the percentage of lifetime added. 
A reward level of 0.1, for example, means that a robot’s life¬ 
time is extended by 10%, i.e., by 200 time steps. We ran 
experiments with varying reward levels ranging from 0.05 
to 0.8, with 50 replicate runs of for each setting. 

The behaviour of collecting pucks and consequently liv¬ 
ing longer improves the reproductive chances of the robot: 
the more pucks a robot collects, the longer its lifespan; 
robots that are skilled at picking up pucks thus live longer 
and consequently have more opportunities to disperse their 
genomes by inseminating robots in egg state. Robots with 
less effective behaviour return to the dormant egg stage 
sooner, accelerating the distribution of genes that lead to 
puck collecting behaviour. Through repetition of this pro¬ 
cess of selection and gene dispersion the entire population’s 
aptitude increases. 

Quantifying Selection Pressure 

Haasdijk et al. (2014) introduced a measure to quantify se¬ 
lection pressure that calculates the likelihood of random as¬ 
sociations between behaviour and number of offspring in a 
population. This measure is based on the premise that an 
increasing level of certainty that the relation between be¬ 
haviour and fecundity is not random indicates a higher selec¬ 
tion pressure. If there were no selection pressure, the rela¬ 
tionship between behaviour and fecundity would be random, 


and contrariwise, if an individual’s chances of generating 
offspring depend on its behaviour, the relationship is sys¬ 
temic. Fisher’s exact test (Fisher, 1925) determines the cer¬ 
tainty of nonrandom associations between the categories in a 
contingency table. We construct contingency tables by con¬ 
sidering the distance covered, number of pucks collected and 
offspring count over the lifetime of the robots in the popula¬ 
tion. We split these individuals into classes with and without 
offspring and we split them along the median distance trav¬ 
elled or the median number of pucks collected during their 
lifetime to create two 2x2 contingency tables: one relating 
offspring and distance travelled and one relating offspring 
to number of pucks collected. The cells of the contingency 
tables contain the count of individuals for that cell (e.g., the 
number of individuals with offspring and below median dis¬ 
tance travelled). 

Fisher’s exact test estimates the likelihood that the two 
classes in each contingency table (having offspring and 
above/below median distance travelled or pucks collected, 
respectively) are associated. The p-values resulting from 
these tests indicate the probability that there is no relation¬ 
ship between having offspring and having above- or below- 
median distance travelled or pucks collected. Thus, low p- 
values indicate high selection pressure and vice versa. Be¬ 
cause the p-values are very small, we ease interpretation 
and comparison by reporting the log-likelihood multiplied 
by —1. 

Results and Analysis 

Figure 2 shows the amount of pucks collected over time 
for a range of reward levels. The reproductive benefit of 
picking up pucks clearly leads to puck-collecting behaviour: 
the robots pick up more pucks as evolution runs its course. 
However, the plot also clearly shows that the number of 
pucks collected is substantially lower for higher reward lev¬ 
els, and that the decrease in performance is systemic. This 
seems counter-intuitive: not only could one expect increased 
reward to provide a stronger incentive, but a longer, possi¬ 
bly even infinite, lifetime also implies that a robot can spend 
more time collecting pucks, because the egg stage in which 
a robot remains passive is deferred or eliminated. Figure 3 
indicates that larger rewards do lead to longer lifetimes: it 
shows the number of deaths over time for different reward 
levels. Increasing the reward level decreases the number of 
deaths in the population per time interval, and with it the 
amount of time spent in the passive egg state (remember, 
each death initiates a 200 time steps egg phase). Thus, the 
robots spend more time in the active state where they can 
collect pucks. 

To analyse the mechanisms that lead to this surprising ef¬ 
fect of increasing reward, consider that when the reward is 
small, robots need to collect multiple pucks to increase their 
lifespan substantially. Robots that are not so skilled return to 
the egg stage quickly, providing receptacles for the genomes 
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Figure 2: Number of pucks collected per 1,000 time steps 
vs time for different reward levels. The number of pucks 
collected initially increases more rapidly for higher reward 
levels, but lower reward levels ultimately lead to better task 
performance. The line plots were smoothed to emphasise the 
trends; the shaded area indicates the 95% confidence interval 
for the median. 


of more skilled robots. However, the reproductive advantage 
from puck gathering for skilled robots decreases as the re¬ 
ward level increases. There are two possible causes: firstly, 
higher rewards allow relatively unskilled but lucky individ¬ 
uals to gain substantial lifetime extensions, allowing them 
to maintain and spread their genomes for much longer than 
would be the case for low reward levels. Secondly, fewer 
robots die at higher reward levels, leaving less opportunity 
for robots with relatively effective behaviour to pass on their 
genomes because there are fewer eggs available for insemi¬ 
nation. Figure 3 shows that for reward levels 0.4 and higher, 
the number of deaths decreases very rapidly, and this stalls 
evolution because no robots become available for insemina¬ 
tion. 

Figures 4 and 5 provide more detail about the distribution 
of puck gathering behaviour and longevity over the popula¬ 
tions. The plots show a positive correlation between reward 
and skewness: high rewards lead to populations where a few 
individuals collect large numbers of pucks and have very 
long lifetimes, while the majority of individuals perform at 
a much lower level. The majority gathers fewer pucks for 
high reward levels so that the median number of pucks col¬ 
lected per individual decreases as the reward increases. High 
rewards thus do lead to excellent task behaviour in a select 
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Figure 3: Number of deaths per 1,000 time steps vs time for 
different reward levels. The number of deaths decreases as 
the population evolves to pick up pucks. The line plots were 
smoothed to emphasise the trends; the shaded area indicates 
the 95% confidence interval for the median. 


few individuals (high best fitness, in evolutionary algorithm 
terms). This, however, is not a relevant measure in an on-line 
setting such as this: here, the performance of each individ¬ 
ual in the population counts, and with high rewards there is 
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Figure 4: Distribution of age for different reward levels for 
individuals that died in the last 10,000 time steps of the runs. 
Each violin plot shows the probability density of age values 
for a reward level (note that the x axis is not to scale). Next 
to the violin plots are boxplots showing median and inter¬ 
quartile range. The plots show combined data over all re¬ 
peats for each setting (individual runs show much the same 
pattern). 
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Figure 5: Distribution of number of pucks collected by indi¬ 
viduals that died in the last 10,000 time steps of the runs for 
different reward levels. Each violin plot shows the probabil¬ 
ity density of values for a reward level (note that the x axis is 
not to scale). Next to the violin plots are boxplots showing 
median and inter-quartile range. The plots show combined 
data over all repeats for each setting (individual runs show 
much the same pattern). 


such a preponderance of very poorly performing individu¬ 
als that the overall performance of the population becomes 
ineffectual. 

We perform a second series of experiments where more 
robots die also for higher reward settings by introducing 
‘random death’. In these experiments, robots can die at ev¬ 
ery time step with a probability of 1.5 • 10 -4 , regardless of 
the amount of lifetime they actually should have left. Thus, 
eggs are more readily available for all reward levels. The 
results of these experiments are displayed in figure 6. The 
graph shows the number of pucks collected during the last 
1,000 time steps at the end of each run for the normal con¬ 
dition (in blue) and the random death (in red) conditions. It 
shows the value for each run as a small circle and the me¬ 
dian for each condition as a x symbol. The plot also shows 
the result of linear regression model of the relationship be¬ 
tween the median number of pucks collected and the log of 
the reward level. 

For lower reward levels the random death condition is not 
beneficial: the robots actually collect fewer pucks. At these 
lower reward levels, sufficient eggs are available all the time 
and more deaths are only counterproductive. Around re¬ 
ward level 0.2, the lines cross, and from this point onward 
the robots collect more pucks on average when robots die 
randomly. Note that the clear pattern of robots collecting 
fewer pucks for higher reward levels persists. This leads 
us to deduce that the decreasing task performance is also 
caused by large rewards decreasing selection pressure be¬ 
cause even relatively poorly behaving robots receive sub¬ 
stantial rewards. 

The graphs in figure 7 show how selection pressure de¬ 
velops over time for a number of conditions. There are 
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Figure 6: The number of pucks collected in the last 1,000 
time steps vs the reward level with and without random 
death. Note that the horizontal axis is logarithmic. The cir¬ 
cles represent individual runs, the x symbols indicate the 
medians for each reward level setting and the red and blue 
lines show the result of log linear regression through the me¬ 
dians. For reward levels lower than ca. 0.2, adding random 
death decreases task performance, for higher values this in¬ 
creases the number of pucks collected. 


two components of behaviour that the environment selects 
for (albeit indirectly). Firstly, robots that move about the 
environment more have a greater chance of encountering 
eggs where they can leave copies of their genome. Sec¬ 
ondly, picking up pucks increases lifetime and so increase 
the number of opportunities to spread the genome. Figs 7a 
and 7b show how the selection pressure deriving from these 
two components develops over time. For reward levels up 
to 0.2, selection pressure increases initially as the relevant 
behaviour spreads through the population. Then, as this be¬ 
haviour becomes prevalent, the consequent relative repro¬ 
ductive benefit and with it the selection pressure decreases 
slowly to an intermediate level. Similar trends were also 
reported by Haasdijk et al. (2014). The selection pressure 
decreases as the reward level increases, supporting our de¬ 
duction. For reward levels higher than 0.2, the picture is 
different: after an initial rise, selection pressure rapidly de¬ 
creases, then fluctuates to settle at a low value. Selection 
pressure settles much sooner than it does for low reward lev¬ 
els. 

Figure 7c compares selection pressures with and with¬ 
out the random death condition for a reward level of 0.2. 
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(a) Selection pressure from movement. 


(b) Selection pressure from number of pucks(c) Selection pressure with and without ran- 
collected. dom death for reward level 0.2. 


Figure 7: Development of selection pressure over time for different reward levels (7a and 7b) and comparing runs with and 
without random death. The selection pressure decreases with increasing reward level; for a reward level of 0.8, the selection 
pressure becomes minimal. Adding random death increases the selection pressure substantially, in particular the selection 
pressure resulting from puck collecting. Selection pressure is quantified as — 1 times the log-likelihood of a random association 
between number of offspring and speed or number of pucks collected. The plots were smoothed to emphasise the trends. 


It shows that increasing the number of deaths in this way 
substantially increases the selection pressure, in particular 
regarding the number of pucks selected. Interestingly, the 
selection pressure becomes much greater and stays much 
higher than it does without random death for lower reward 
levels, even though the number of pucks collected remains 
substantially lower. 

The fluctuations for higher reward levels after the initial 
rise in selection pressure correspond with the fluctuations 
in Figs 2, 3 and 8. The fluctuations become increasingly 
pronounced and persistent as the reward level increases. In 
all cases, these fluctuations occur for reward levels higher 
than 0.2, which is also the tipping point at which adding 
random death starts improving puck gathering behaviour. 
This seems to indicate that this fluctuating trend is related 
to the lack of available eggs. Because of the high reward 
levels, there are few available eggs until robots that have 
poor behaviour but are lucky eventually do start dying off. 
At that point, there is a period where robots can spread their 
genome and robots with more appropriate behaviour enjoy 
some reproductive advantage, increasing selection pressure 
until there are few eggs available again. The cycle then re¬ 
peats until the behaviour stabilises. 

Higher reward levels do imply a faster increase in the 
number of pucks collected. The limited increase in lifespan 
for lower reward levels implies that moving at speed to be 
able to impregnate many eggs is an important component of 
reproductive success. This is also borne out by figure 7: ini¬ 
tially, the selection pressure from movement is higher than 
from puck collection. As the median speed of the robots 
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Figure 8: Median robot speed vs time for different reward 
levels. The development of speed shows the same trend as 
that of the number of pucks collected in 2: speed increases 
more rapidly for higher reward levels, but lower levels ulti¬ 
mately lead to higher speeds. The plots were smoothed to 
emphasise the trends. 
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rises (figure 8), collecting pucks becomes more advanta¬ 
geous as now it is possible to live substantially longer even 
with low reward levels. At higher reward levels, speed is not 
necessary to live for a very long time, and consequently still 
be able to spread one’s genome. Figure 7 shows that speed 
matters only initially, and the benefits of collecting pucks 
appear sooner than for lower reward levels. Thus, it seems 
that high reward levels emphasise the puck collecting task 
too much to the detriment of movement, causing evolution 
to focus on the quick win of collecting a couple of pucks. 
More modest rewards cause evolution to explore the bene¬ 
fits of movement equally, and in the end this is beneficial for 
collecting pucks as well. 

Conclusion 

In this paper, we investigated an evolutionary system where 
robots randomly exchange genetic material in an environ¬ 
ment that bestows benefits on robots that collect pucks. This 
benefit takes the form of an increase of the robot’s lifespan. 
We showed that the population of robots does learn to pick 
up pucks as a result of the natural selection that is biased by 
the reward of additional lifetime without any further explicit 
selection. 

When considering different reward levels, we saw that, 
unexpectedly, increasing the reward actually leads to poor 
performance, although the number of pucks collected ini¬ 
tially rises more rapidly. High rewards emphasise one com¬ 
ponent of behaviour (in this case, collecting pucks) to the 
detriment of other components of behaviour (in this case, 
movement). Consequently, evolution focusses too much on 
the quick win of collecting pucks and neglects movement, 
becoming stuck in sub-optimal behaviour. If the reward is 
too big, there is no gradient for evolution to exploit: the 
benefit of mediocre or even poor performance is so big that 
there is little incentive to improve behaviour and evolution 
bogs down. 

The most immediate conclusion, then, is: if the goal is for 
a population of robots to collect as many pucks as possible 
in a setting with biased natural selection, the reward for puck 
collecting should be minimal. However, it is tenuous to gen¬ 
eralise this precise conclusion to other systems, e.g., where 
the benefit is awarded by increasing the speed of movement. 

More generally put we showed that unduly rewarding be¬ 
haviour in one aspect limits evolution’s capability to explore 
the behaviour space, and the mechanisms that generate se¬ 
lection pressure must be balanced with care. This resonates 
with findings of research into explicitly selecting for diverse 
behaviour, for instance by Lehman and Stanley (2011). 

It is a truism that evolution requires death, and in our ex¬ 
periments, the effects of overly focussing on puck collec¬ 
tion are exacerbated by the lack of opportunities to spread 
a robot’s genome because the increased lifespan reduces the 
number of available receptacles. This lack seems to become 


particularly pressing when the reward level exceeds the tip¬ 
ping point of increasing lifespan by 20%. 

The research presented here is part of an ongoing effort 
to research the interacting selection processes in embodied 
evolution, and further investigations, in particular to relate 
the findings here to experiments with multiple tasks and with 
explicit selection for task behaviour, are underway. 
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Abstract 

Measurements of coordinated motion in flocks are necessary 
to evaluate their performance. In this work, a set of quantita¬ 
tive metrics to evaluate the performance of the spatial features 
exhibited by flocks are introduced and applied to the well- 
known boids of Reynolds. Our metrics are based on quanti¬ 
tative indicators that have been used to evaluate fish schools. 
These indicators are revisited and extended as a set of three 
new metrics that can be used to evaluate and design flocks. 

Keywords, metrics, flock, behaviors, boids 

Introduction 

An intriguing collective phenomenon of nature is caused 
by animals moving as a coordinated unit, for example a 
flock of birds or a fish school. This phenomenon has at¬ 
tracted interest in various scientific fields such as biology, 
artificial life and artificial intelligence distributed, because 
the phenomenon suggests an intelligent organization that 
transcends the abilities of each individual (Camazine et al., 
2003). 

The coordinated collective movement has important ap¬ 
plications such as virtual reality and computer animation. 
The flocks models are often used to provide realistic-looking 
representations of flocks, schools or herds. For instance, 
they are used in the Half-Life video game to model the flying 
bird-like creatures or in the film Batman Returns to model 
bat swarms and armies of penguins (Bajec and Heppner, 
2009). 

In addition, the flocks models can be used for direct con¬ 
trol and stabilization of teams of simple Unmanned Ground 
Vehicles (UGV) or Micro Aerial Vehicles (MAV) in swarm 
robotics (Min and Wang, 2011; Saska et al., 2014). Also, 
they can be used in applications that require the coordinated 
action of multiple autonomous individuals such as flying 
robots (drones) used in collective search, agricultural moni¬ 
toring and event surveillance (Vsrhelyi et al., 2014). 

Other interesting applications where flock models has 
been applied is the automatic programming of Internet 
multi-channel radio stations and for optimization tasks 
(Ibanez et al., 2003; Cui and Shi, 2009). 


The measurement and evaluation of the collective perfor¬ 
mance of autonomous agents is an open issue (Navarro and 
Matia, 2009). A lot of work remains to be done in the de¬ 
velopment of indicators that capture important aspects of the 
collective dynamics of groups of autonomous entities based, 
for instance, on the goal achievement, the formation of spa¬ 
tial patterns, and the exploitation of resources. 

The tune of values for the achievement of coordinated 
flocks is far from being trivial, because the parameters that 
influence the behaviors of the members of flock are closely 
interrelated. Performance metrics are the criteria that deter¬ 
mine success in the behavior of a system. Therefore it is 
necessary to design objective performance metrics that al¬ 
low discern that flocks behave better. 

In this work we propose new metrics in order to capture 
the global performance, in spatial terms, of the flocks such 
that they can be used as a benchmark. Our metrics are based 
in quantitative indicators that have been used to characterize 
spatial features exhibited by a fish school: extension, polar¬ 
ization and frequency of collisions (Huth and Wissel, 1992; 
Zheng et al., 2005). These measures are revisited and ex¬ 
tended in a set of three new measures: consistency in expan¬ 
sion, consistency in polarization, and quality. 

Related Work 

A seminal work that aims at reproducing the flock phe¬ 
nomenon is the model of coordinated collective motion pro¬ 
posed by Reynolds (Reynolds, 1987), where individual enti¬ 
ties, generically known as boids , achieve realistic behaviors 
by the application of a set of basic rules. This model is used 
in this work to applied our performance metrics. Therefore, 
we review in detail this model in the next section. 

In the area of biology, Huth and Wissel (Huth and Wissel, 
1992) present a simulation of a school of fish. The behav¬ 
ior of each fish are attraction, repulsion, parallel orientation 
and search. Polarization and extension are proposed as de¬ 
scriptive metrics of a school. The polarization reflects the 
degree of alignment of the agents headings, if fish belong¬ 
ing to the school are oriented in similar directions a school 
has a small polarization. The extension reflects the degree 
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of cohesion that has a school, that is, how far the fishes are 
between themselves. 

In the work of Huepe and Aldana (Huepe and Aldana, 

2008) are compared three simple models that reproduce 
qualitatively the emergent swarming behavior of bird flocks 
or fish schools, by using of the metrics: polarization, local 
density and nearest neighbor mean distance. While the po¬ 
larization (standard order parameters use in the flocks) be¬ 
have equivalently in all cases, the local density and near¬ 
est neighbor mean distance introduced provide a more de¬ 
tailed description that can clearly distinguish the properties 
of swarming behavior. 

In the work of Navarro and Matia (Navarro and Matia, 

2009) a set of metrics that measures the performance of 
collective movement of mobile robots is proposed and dis¬ 
cussed. The different metrics proposed cover several aspects 
of the characteristics of the collective movement including: 
those related to the area and shape of the group; the move¬ 
ment; and the positioning and orientation of its members. 

In the work of Bajec et al. (Bajec et al., 2007) it is pre¬ 
sented an artificial animal construction framework that has 
been obtained as a generalization of the existing bird flock¬ 
ing models. A set of metrics that can measure and judge 
the flocking behavior of a group of boids is presented and 
the metrics are used in a series of controlled experiments to 
evaluate the flocking behavior of boids. 

The boids model 

This work is based on a model of coordinated collective mo¬ 
tion proposed by Reynolds in his classic article (Reynolds, 
1987), which is inspired by the movement of flocks and 
schools. Entities belonging to a formation (birds, fish, etc.) 
were called by Reynolds generically as boids. Each boid 
apply a simple set of steering behaviors such as cohesion, 
separation and alignment, that govern their movement. A 
flock is the result of the interaction of each boid with its 
neighbors. 

The set B of n boids bi involved in the flock is denoted 
by formula 1. 


vectors “forward”^ axis), “side”(z axis) and “up ”(y axis). 
Where each boid has a local view of its environment called 
“area of perception”related to a steering behavior. The area 
of perception is determined by a radius r and a angle 0 (field 
of view) where only neighbors who are in the area of per¬ 
ception are selected for calculating certain steering behavior 
(see Figure 1). 



Figure 1: Figure (a) shows the local space of a boid. Figure (b) 
shows the area of perception of a boid. 

The set of boids bj perceived by the boid hi is denoted Pi 
and is calculated as follows: 

Step 1. Calculate distance, distance dij is determined from 
the boid bi and boid bj, 

dij = \\Pj(t)-pi(t)\\ (2) 

Step 2. Calculate angle, the angle Oij is determined between 
the vector “forward”of boid bi and the unit vector in the di¬ 
rection of the position of the boid bj, 
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Step 3. Calculate neighbors, determine if the distance dij 
is less than the radius r of boid bi and angle 0^ is within 
the angle 0 of boid bi, then the boid bj belongs to its local 
neighborhood, 


B = {bi,i = l,2,...,n} (1) 

Each boid bi has a position vector pi (t) and a velocity 
vector vl(t) that describe its motion in space in a time t. 
The force which adjusts the speed of the boid is typically 
an impulse generated by the same, so that impulse is limited 
in a scalar of maximum force denoted as f m . In addition, 
boids are restricted to a maximum speed expressed as s m . 
This speed limit is imposed by a gating of the velocity vector 
of the boid. In addition, the acceleration acquires a boid 
also depends on the inertia of the body expressed as a scalar 
quantity mass m^. 

The local space associated with each boid bi is described 
by fi(t), si(t) and Ui(t) which refer respectively to the 


Pi = {bj G B ; Mbj : dij < r A Oij <0,j — 1,2,..., m} (4) 

where m is the number of boids perceived by the boid bi on 
its radius of steering behavior. 

The steering behaviors of a boid bi executed at time t 
are cohesion, Ci{t)\ separation, sl(t); and alignment, dl(t) 
where the areas of perception associated with these steering 
behaviors are determined by the radii and angles r c , 6 C \ r s , 
0 S ; and r a , 0 a respectively. In addition, the steering behav¬ 
iors cohesion, separation and alignment are associated with 
the sets of boids bj perceived Ci, Si and Ai, respectively. 
The calculation of neighbors boids belonging to said sets is 
performed as in steps 1, 2 and 3 above. 
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The purpose of cohesion behavior is to move a boid to¬ 
wards the center of a group perceived within its neighbor¬ 
hood. If this steering behavior were uniquely applied, the 
formation would be gathered together in a region. The for¬ 
mula of cohesion, cl(t) is expressed in (5), where m c is the 
cardinality of the set Ci . 

Ci{t)=i— ^2 &(*)]-&(*) ( 5 ) 

V Wc V6 je C s j 

The purpose of separation behavior is to move a boid to 
avoid a collision with their neighbors and prevents agglom¬ 
eration of formation. If only this behavior is applied, the 
formation dissipate. The formula of separation, si (t) is ex¬ 
pressed in 6. 


•''•/(/) = ~ ^L, (PiO)-PiW) (6) 

VbjESi 

The purpose of alignment behavior is to move a boid in 
the same direction as their neighbors. The alignment behav¬ 
ior acts as a first heuristic to avoid collision, because when 
all boids of a formation move at the same velocity the risk 
of collision between them is reduced. The formula of align¬ 
ment, al (t) is expressed in (7), where m a is the cardinality 
of the set Ai . 

ck(t)= (7- ^ Vj(t) ] -Vi(t) (7) 

\ ma J 

When each boid bi applies the basic behaviors of cohesion 
cl(t), separation sl(t), and alignment al(t) in combination, 
as result the formation is held together and moves coordi- 
nately. The formula for flocking, fki(t ) is expressed in (8). 

fki(t) = acl(t) + /3sl(t) + Sdi(t) (8) 


Vi(t) 


Vi(t—At)+aci(t) 
frn \\vi(t-At)+aci(t)\\ 

vl(t — At) + aci(t) 


if || v\ ( t - At) + aci (t) || > Sm 

otherwise 

(id 


Pi(t) =Pi{t- At) + Vi(t) (12) 

In addition, it is necessary to limit the magnitude of 
change in orientation of the boids to prevent that boids 
look nervous due to abrupt changes and apply a behavior 
to avoid the walls. For the sake of space, the details of 
the implementation of these features is available online in 
the project code based on the library OpenSteer (https : 
//github . com/Zapotecatl/MetricsBoids). 

Performance Metrics 

In this work the performance of the boids was evaluated in 
terms of extension, polarization and frequency of collision 
(Huth and Wissel, 1992)(Zheng et al., 2005) as quantita¬ 
tive indicators to characterize a formation, these metrics are 
revisited and extended. In this work we propose the met¬ 
rics of consistency in extension, consistency in polarization, 
and quality, all of them are quantitative performance metrics 
about the collective dynamics of a formation. 

The extension reflects the degree of cohesion of the flock 
and is determined by the average distance between one boid 
and the center of the flock. In this work the extension is 
given in terms of centimeters. The minimum value of the 
extension is 0cm, a value that represents the situation where 
all boids are gathered together in one point in the environ¬ 
ment. The maximum value of the extension depends on the 
shape and size of the pond, the number of boids and their 
distribution in the pond. The center of the flock, a value re¬ 
quired to calculate its extension, is denoted as cen(t ), and it 
is expressed in (13). 


Where each behavior is multiplied by the weights a , (3, 
and S and the range of values of these weights is [0, oo). 
The net force is determined by the force fk^t) and limited 
by the maximum force f m . 


l A \ 

cen(t) = — y'pi(i) (13) 

i=i 

The extension of the flock at time t is denoted as ext(t) 
and it is calculated by applying (14). 


m = 


f fkj(t) 

7 ™||/Mt)|| 

fm 


if 


fki(t) > frr 


otherwise 


(9) 


Acceleration is equal to net force divided by the mass of 
the boid and is expressed in (10) 

aci(t) = —Fi(t ) ( 10 ) 

m b 

The new position of boid bi at time t is calculated from 
its velocity vl(t) and its previous position at time t — At, as 
presented in formulas (11) and (12). 


1 

ext(t) = - y II cen(t) -pl(t)\\ (14) 

The polarization is defined as the average of the angular 
deviation of each boid with respect to the average orienta¬ 
tion of the entire group and expresses the degree of align¬ 
ment of the boids headings, if boid belonging to the flock 
are oriented in similar directions a flock has a small polar¬ 
ization. Polarization holds a value in the range [0°,90°], 
where 0° represents a flock with an optimal parallel orienta¬ 
tion where boids are perfectly aligned, and 90° represents a 
flock with the highest degree of “confusion”where boids are 
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completely unaligned. The average orientation of the group 
is denoted as j2 p (t) and it is expressed in (15). 

1 n - 

/MO 5 , ) E- / ’' (/) (15) 

x=l 

The angle between the vectors fi(t) and / T p (t ) is repre¬ 
sented by the symbol Z and expressed in (16), and the po¬ 
larization of the flock at time t is denoted as pol(t) and it is 
calculated from the values obtained in (16) as expressed in 
formula (17). 
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n 

i= 1 

The frequency of collision represents the degree of con¬ 
flict among boids and is defined as the average of the num¬ 
ber of boids in the collision state. The frequency of colli¬ 
sion holds a value in the range [ 0 , 1 ], where 0 represents the 
ideal scenario where no collision occurs, whereas 1 repre¬ 
sents the worst scenario where all agents collide with each 
other. When a boid bi is in the collision state, the value of 
Ci(t) is 1, otherwise its value is 0. The formula to calculate 
the collision state of boid bi is expressed in (18), where r 5 is 
the radius of the body of the boid . 


r .n\_j 1 if \\Pj ~Pi\\ < n,j = 

^ ' (0 otherwise 

(18) 

The number of boid in collision state is expressed in (19). 

n 

col(t) = E Ci(t) (19) 

i= 1 

The frequency of collision of the flock at time t is denoted 
fcol(t ), as expressed in ( 20 ). 

fcol(t ) = —col{t) (20) 

Proposed Metrics 

In order to capture the global performance of a flock con¬ 
sidering the different aspects of its behavior that are repre¬ 
sented by the combination of previous metrics, we propose 
three new metrics: 1 ) consistency in the extension, 2 ) con¬ 
sistency in the polarization, and 3) quality. These metrics 
express the relationship between the extension, polarization 
and the frequency of collision. 

The consistency in extension aims at balancing the ra¬ 
dius separation in a formation. Minimizing the values of 
extension and frequency of collision affects the flock since 


the cohesion of groups is maintained by the combination of 
these metrics. The extension of a flock decreases when boids 
get close to each other, which is considered a positive action. 
However, if they get to close to each other, collisions might 
happen, which is considered a negative action. Therefore 
a careful balance of these metrics is necessary. The con¬ 
sistency in the extension evaluated at time t is denoted as 
cns exp (t ) and it is expressed in ( 21 ). 


CTlS e xt{t ) — 1 


TnL 1 II cenjt) -Pi(f)|| + k • col(t) 
max e • n 


( 21 ) 


where k is a distance penalty that holds a value in the range 
[ 0 , max e \ and m to denote the number of boids that do not 
collide with each other. The constant max e represents the 
maximum extension that depends on the shape and size of 
the pond, the number of boids and their distribution in the 
pond. In this work we consider the value of max e as half of 
the distance between two extreme points of the pond, which 
normalizes the second term in (21). Finally the complement 
is calculated. The consistency in the extension holds a value 
in the range [ 0 , 1 ], where 0 indicates the worst consistency 
and 1 the optimum consistency. 

Expression (21) captures the necessary balance between 
the radius separating boids and the frequency of collision. 
Each boid adds a proportional contribution of its distance to¬ 
wards the center of the flock to the group consistency. How¬ 
ever if the boid collides this contribution is overridden and 
summarized as a constant in the penalties. 

if k = max e the integrity of the boids is highly weighted. 
A flock is evaluated with a value of cns exp = 0 in a situa¬ 
tion where all its members collide, whereas consistency in 
extension it approached to 1 when boids not collide. On the 
other hand, cns exp = 1 can only be achieved if k = 0 and 
all the members of flock collide. 

The consistency in polarization aims at balancing the 
orientation of a flock. Minimizing the values of polarization 
and frequency of collision affects the flock since the boids 9 
goal is to achieve coordinated motion of uniformly aligned 
boids that do not collide to each other. The polarization of 
a flock decreases when boids are similarly oriented, which 
is considered a positive action when moving together. How¬ 
ever the boids may need to change their orientation, but if 
such a change is not performed promptly boids may collide 
with each other. Therefore a careful balance of these mea¬ 
sures is necessary. The consistency in the polarization eval¬ 
uated at time t is denoted as cns po i (t) and it is expressed in 
( 22 ). 


cns P oi (t) = 1 


SHi + p • coljt) 

180-n 


( 22 ) 


where p is an angle penalty that holds a value in the range 
[ 0 °, 180°] and m to denote the number of boids that do not 
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collide with each other. The constant 180° represents the 
maximum polarization, which normalizes the second term 
in (22). Finally the complement is calculated. The consis¬ 
tency in the polarization holds a value in the range of [ 0 , 1 ], 
where 0 indicates the worst consistency and 1 the optimum 
consistency. 

Expression (22) reflects, on its side, the necessary bal¬ 
ance between the angular orientation of the boids and the 
frequency of collision. 

Each boid adds a proportional contribution of its angular 
deviation with respect to the average group orientation. Sim¬ 
ilar to the consistency in extension, if the boid collides this 
contribution is overridden and summarized as a constant in 
the penalties to obtain the consistency in polarization. 

Again, in this measure if p = 180° the integrity of the 
boids is highly weighted. A value of cns po i = 0 is achieved 
when the group is highly polarized and all its members col¬ 
lide. On the other hand, cns po i = 1 when the group is per¬ 
fectly aligned. 

The quality aims at establishing a criterion to combine the 
results in the consistency in both extension and polarization, 
in such a way that we can evaluate the global performance 
of a flock. The quality is weighted by the factors a and 7 
which determine, respectively, the influence of consistency 
in the extension and polarization on the final result. The 
quality is expressed in (23). 

qlty{t) = a ■ cns ext (t) + 7 • cns po i(t) (23) 

where 0 < cr < 1 , 0 < 7 < 1 and cr + 7 = 1 

Note that for each one of the previous metrics, both the 
punctual metric during the simulation steps, as well as the re¬ 
sulting metric in one simulation run are calculated. The later 
is calculated as the average of the corresponding measure 
during the simulation steps. For instance, the total quality 
is expressed in (24). 

1 1 

qlty = 7 X] qlty(t ) (24) 

L t= 1 

where the number of iterations performed during the simu¬ 
lation is denoted by 1. 

Results 

The boids model was implemented and run on a computer 
with i5 processor and 16 GB of RAM equipped with a 
Nvidia Geforce GT 730 graphics board under Linux. The 
Open Source 3D Graphics Engine (OGRE) was used and 
an object-oriented programming approach was applied for 
our experiments. The library OpenSteer enabled us an ac¬ 
curate replication of boids to conduct experiments on fair 
and common basis (Reynolds, 1999), the source code of 
this project is available online (https : //github . com/ 
Zapotecat 1/Met ricsBoids). 


The shape of the pond that we used in our simulations is a 
cuboid with dimensions in width, height and depth, denoted 
as W, H and D , respectively (see Figure 2). The boids were 
separately, initially distributed in random positions where 
the maximum distance that separates a boid from the cen¬ 
tral point of the pond is 50cm, in such a way that they can 
be perceived by each other. The boids avoid colliding with 
the walls of the pond so periodic boundary conditions are 
not applied. 
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Figure 2: The shape of the pond used in our simulations is a 
cuboid. The boids can collide with the walls of the pond so pe¬ 
riodic boundary conditions are not applied. 

The 3D environment allows qualitatively visualize the re¬ 
sult of our metrics (see Figure 3). The videos on how 
the groups look in relation to metrics are available online 

(https : //vimeo . com/user4 9682258/videos). 



Figure 3: The figure shows a flock evaluated by the quality metric 
with a value of 0.61. 


Extensive experiments were conducted to evaluate in 
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terms of previous metrics the performance of boids model. 
The experiments consisted in varying the radii r s and r a that 
determines the area of perception associated with the behav¬ 
iors of separation and alignment in order to estimate the per¬ 
formance of the flocks under different configurations. Co¬ 
hesion radius r c = 50cm for all experiments. For each ex¬ 
periment configuration the simulation was repeated 15 times 
for obtaining representative averages. The parameters used 
in our simulation are shown in the table 1. 


Parameter 

Value 

Description 

W 

300cm 

Size of pond in width 

H 

150cm 

Size of pond in height 

D 

300cm 

Size of pond in depth 

l 

100000 

Number of iterations 

n 

50 

Number of boids 

Tb 

5cm 

Radius of the body 

fm 

50cm/s 

Maximum force 

Sm 

25cm/s 

Maximum speed 

m h 

1 

Mass 

r c 

50cm 

Cohesion radius 

r s 

(0-50)cm 

Separation radius 

Ta 

(0-50)cm 

Alignment radius 

0c 

360 ° 

Cohesion field of view 

e s 

360° 

Separation field of view 

e a 

360 ° 

Alignment field of view 

a 

25 

Cohesion weight 

P 

25 

Separation weight 

5 

25 

Alignment weight 

k 

225cm 

Penalty distance 

P 

180° 

Penalty angle 

a 

0.5 

C. extension weight 

7 

0.5 

C. polarization weight 


Table 1: Parameters used in our simulation with the boids model. 

Figure 4 shows that the shorter the radius of separation 
and bigger the radius of alignment decreases the total exten¬ 
sion. That is explained by the fact that, as long as the align¬ 
ment radius is increased, the boids are motivated to establish 
the formation. 

Figure 5 shows that when boids are assigned a big radius 
of alignment the total polarization is small. That is explained 
by the fact that, as long as the alignment radius is increased, 
the boids are motivated to go in the same direction. 

Figure 6 shows that the bigger the separation radius, the 
smaller the total frequency of collision. That is explained 
by the fact that the boids are usually dispersed when they 
apply a big separation radius and for that the possibility of 
collision is reduced. 

It is worth recalling that the quality metric provides a 
value that combines the results of the consistency in the ex¬ 
tension and consistency in the polarization previously intro¬ 
duced. Figure 7 shows that the maximum value of total qual¬ 
ity is in the order of = 0.93 resulting from different config¬ 
urations with which we can conclude that, If the integrity 
of the members of the flock is highly weighted, the flock is 
evaluated with a good quality under these scenarios: (1) 



Alignment Radius 


Figure 4: Extension, the minimum value of total extension is in 
the order of 1.64cm with r c — 50 cm, r s — 0cm, and r a — 30cm. 



Alignment Radius 


Figure 5: Polarization, the minimum value of total polarization 
is in the order of 0.79° with r c — 50cm, r s = 50cm, and r a — 
50 cm. 


The value of the separation radius is in a range that enables 
collision avoidance. This radius is in the range [20, 50], (2) 
The value of the alignment radius is greater than or equal 
to the radius of separation, and (3) The value of the align¬ 
ment radius is less than or equal to the radius of cohesion, as 
illustrated in Figure 7. 

Figure 8 shows a configuration of radii of perception r c . 
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Figure 6: Frequency of collision, the minimum value of total fre¬ 
quency of collision is in the order of 0.0001 with r c = 50era, 
r s — 50 cm, and r a — 50 era. 



Alignment Radius 


Figure 7: Quality, the maximum value of total quality is in the 
order of = 0.93 resulting from different configurations. 


r s and r a of the flock that received the highest evaluation 
based on the quality metric. 

Based on previous results we confirm that the combined 
application of the behaviors of cohesion and separation con¬ 
tribute to keep the flocking boids gathered without collid¬ 
ing. The alignment behavior on its side is useful to achieve 
coordinated motion of the boids and also avoid collisions. 



r* = 5cm 
t \ - 20cm 
r CI = 30cm 
r c = 50cm 


Figure 8: A configuration of areas of perception that received the 
highest evaluation based on the quality measure are: r c — 50 cm, 
r s — 20 era and r a — 30 era. 


The tune of values for the achievement of coordinated flocks 
is far from being trivial, because the parameters that influ¬ 
ence the behaviors of the boids are closely interrelated. A 
small separation radius, for instance, brings boids to gather 
in compressed and “well-welded”groups, that are however 
quite prone to collide among them. In contrast, a big sepa¬ 
ration radius results in boids that are comfortably separated 
from each other and which are more prone to detaching from 
the flock. 

Concluding Remarks and Future Work 

The proposed metrics represent in such a way the dynam¬ 
ics of the system and they contribute to the analysis of the 
phenomenon of collective coordinated motion, in terms of 
generic global parameters and the relationships among these 
parameters. Therefore, since a flock refers to a group that 
shows a class of polarized, non-colliding and aggregate mo¬ 
tion (Reynolds, 1987), the metrics allow to establish a gen¬ 
eral benchmark for the evaluation of models of the type 
flock; not only for boids model. 

The performance metrics proposed in this research might 
be applied not only for qualifying some aspects of group be¬ 
havior but also for tuning the behavior of groups of artificial 
agents. In effect, parameters such as radius of perception, 
field of view, and weight of behaviors for producing flocks 
able to reach the highest quality can be certainly estimated. 
Therefore, the metrics proposed in this work can be used for 
the design of flocks. 

For example, is possible to apply a methodology that uses 
genetic algorithms for evolutionary development of a flock 
(Wood and Ackland, 2007; Olson et al., 2013). The fitness 
function apply quality metrics to punish flocks where mem¬ 
bers collide too and reward flocks where members do not 
collide. Resulting in the optimization of the behavior of 
agents so that they form flock and simultaneously not col¬ 
lide. The parameters to deliver the optimal results can be 
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applied in a animated flock or in autonomous flying robot 
(Virgh et al., 2014). 

As future work, we are also working on the refinement 
of performance metrics that enable a more representative 
evaluation of the dynamics of the flock than current metrics. 
Note, for instance, that using current metrics a flock can be 
evaluated similarly when boids are fully dispersed that when 
they form dispersed subgroups in the environment. These 
situations should be properly identified. 
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Abstract 

This article presents a novel bio-inspired emergent gradient 
taxis principle for robot swarms. The underlying communica¬ 
tion method was inspired by slime mold and fireflies. Nature 
showcases a number of simple organisms which can display 
complex behavior in various aspects of their lives such as sig¬ 
naling, foraging, mating etc. Such decentralized behaviors 
at the organism level gives rise to an emergent intelligence 
such as in bees, slime mold, fireflies etc. Chemo taxis and 
photo taxis are known to be abilities exhibited by simple or¬ 
ganisms without elaborate sensory and actuation capabilities. 
Our novel algorithm combines the underlying principles of 
slime mold and fireflies to achieve gradient taxis purely based 
on neighbor-to-neighbor communication. In this article, we 
present a model of the algorithm and test the algorithm in a 
multiagent simulation environment. 

Introduction 

Swarm robotics research has shown that complex prob¬ 
lems can be solved in unconventional ways (Schmickl 
and Hamann, 2011)(Schmickl et al., 2008)(Bjerknes et al., 
2007). Swarm intelligence uses simple agents following 
simple rules to solve complex problems. Without the advan¬ 
tages of swarm intelligence, such complex problems could 
only be solved at a relatively high amount of computational 
and economic resources. In project subCULTron (subCUL- 
Tron, 2015), we aim to develop an underwater society of 
learning, adapting, self-sustaining robots which can be used 
for various applications. Given the challenges of robotic 
systems underwater such as limited availability of classical 
communication systems, limited mobility etc., there need to 
be stable but simple solutions. In nature, there are many such 
behaviors such as chemical gradient taxis (chemo taxis) in 
slime mold (Webb, 1998) or thermal taxis in bees (Grodzicki 
and Caputa, 2005). It is challenging that nature solves these 
problems with the minimum resources. Gradient taxis is an 
example of an algorithm that will be used in subCULTron for 
gradient ascent or descent. Like many swarm researchers in 
the past (Schmickl and Crailsheim, 2007) (Nakagaki, 2001), 
we draw inspiration from nature to solve the gradient ascent 
for robots in subCULTron. 


Many studies in the past have been done on application 
of swarm based algorithms in robotics. Swarm behavior 
is based on decentralized underlying rules at the organism 
level giving rise to an emergent intelligence such as in bees 
(Bodi et al., 2015) (Kembach et al., 2009), slime mold (Nak¬ 
agaki et al., 2004), fireflies (Buck and Buck, 1966) etc. The 
aggregation and maze solving capabilities of slime mold 
have been extensively researched (Nakagaki, 2001). Slime 
mold swarms have been used to solve mazes (Nakagaki 
et al., 2000) and this capability has been tested in real world 
scenarios such as the Tokyo rail transport system (Nakagaki 
et al., 2004). Similarly, fireflies and their ability of phase 
synchronized pinging has been of interest to the research 
world for a long period of time (Buck and Buck, 1966). 
Many of such behaviors has found applications in engineer¬ 
ing and computer science. For example, Yang (2009) has 
taken inspiration from fireflies to solve multi-modal opti¬ 
mization problems and such efforts have shown promising 
results. 

In this paper, we present a novel method which combines 
communication behavior from slime mold as well as fire¬ 
flies for gradient ascent. Various kinds of gradient functions 
to a swarm of agents for testing the algorithm. The follow¬ 
ing sections will first formulate the algorithm, describe the 
testing scenarios, methods and discuss the results in that or¬ 
der. Our algorithm is a fine example of how emergent solu¬ 
tions can be used for tasks without using complex compu¬ 
tation, large amount of memory and with minimum power 
consumption. There exist classical approaches for gradi¬ 
ent ascent and multi-modal optimization such as the steepest 
gradient descent (Arfken, 1985), Particle Swarm Optimiza¬ 
tion (Kennedy and Eberhart, 1995) etc. Another possible 
solution for gradient related problems is Simultaneous Lo¬ 
calization And Mapping (SLAM)(Bazeille and Filliat, 2011) 
where an agent constructs a map of an unknown environ¬ 
ment while simultaneously keeping track of its own location 
and the gradient value. Then a global observer is able to 
guide the agent finally to the maximum or minimum gradi¬ 
ent value. However, these solutions require high amount of 
computation power which is economically and computation- 
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wise unfeasible for a swarm of underwater robots such as 
that in subCULTron. 

The objectives of this paper are as follows: 

1. Formulate a novel emergent gradient taxis algorithm in¬ 
spired by slime mold and fireflies. 

2. Test the algorithm in various types of gradients. 

3. Validate the algorithm by investigating boundary condi¬ 
tions. 

4. Discuss strengths and weaknesses of the algorithm and 
compare it with an existing emergent gradient taxis algo¬ 
rithm (SwarmTaxis Bjerknes et al. (2007)). 

Biological Inspiration 

As previously mentioned, the presented algorithm takes in¬ 
spiration from slime mold and fireflies. Therefore, it is of 
merit to look into the aspects of their biological behavior 
that we draw inspiration from. This section briefly discusses 
communication strategies used by slime mold and fireflies 
which will support the formulation of the algorithm. 

Slime Mold 

Slime mold (Dictyostelium Discoideum ), is a free living 
diploid life form. It has been subject of much study in the 
past due to its ability to survive harsh environments by tak¬ 
ing advantage of group behavior. Slime mold, during its 
life cycle, aggregates with other cells to from a multicel¬ 
lular organism. Each organism starts its life as a unicellular 
amoeba, but during starvation they aggregate to form a multi 
cellular fruiting body. Chisholm and Firtel (2004) divide its 
life cycle as follows: Aggregation, Streaming, Slug, Culmi¬ 
nation, Fruiting body. The algorithm presented in this paper 
will deal mainly with the aggregation phase and hence will 
look it in detail. 

When there are ample food sources, cells grow and di¬ 
vide in a matter of three to four hours (Siegert and Weijer, 
1992). On the other hand, if there is a scarcity of food, sig¬ 
nificant cooperation between the cells begin, thereby kicking 
off the aggregation phase. During this time, some cells (cen¬ 
ters) release Cyclic Adenosine Monophosphate (cAMP) into 
the environment to induce a chemical concentration spike 
around them (Siegert and Weijer, 1992). cAMP concentra¬ 
tion diffuses very quickly into the environment and there¬ 
fore the chemical spike is short-lived. This chemical spike 
enables these centers to recruit other cells present around 
them. When surrounding cells perceive this chemical sig¬ 
nal, they move towards areas of high cAMP concentration 
and release cAMP themselves, thereby relaying the signal. 
This in turn, attracts other cells towards the centers. One 
cell is able to release cAMP at an interval of 12-15 seconds 
(Alcantara and Monk, 1974); during this interval, individ¬ 
ual cells are insensitive to cAMP pulses. This interval can 


be understood as the refractory phase of the amoeba. The 
signal relaying mechanism described above forms the basis 
for spatiotemporal patterns known as scroll waves (Siegert 
and Weijer, 1992). The refractory phase is responsible for 
these scroll waves as it prevents the signaling organism from 
perceiving its own signal that was relayed earlier. The emer¬ 
gence of scroll waves enable the amoeba to move towards 
the recruiting centers for successful aggregation. 

Fireflies 

Fireflies are a family of insects that are capable of produc¬ 
ing bio-luminescence to attract a mate or a prey (Buck and 
Buck, 1966). The brightness of the bio-luminescent light 
depends on the amount of luciferin, a light emitting com¬ 
pound, available with the firefly (de Oliveira et al., 2011). 
Bio-luminescence of various families of fireflies has been a 
subject of elaborate study in the past (Buck and Buck, 1966). 
Apart from being able to blink, fireflies are known to behave 
in cooperation with other fireflies. It is a spectacular sight to 
see thousands of fireflies light up in unison on a tree lighting 
it up entirely. This uniform blinking is in order for the swarm 
to have higher chance of attracting mates or prey (Buck and 
Buck, 1966). The luminescence of the blinking swarm is 
much more than that of an individual firefly. 

Such synchronicity is a result of a simple mechanism by 
which initially the individual fireflies blink randomly and 
when it perceives a blink in its surrounding, it blinks again 
and then resets its own frequency to match the other (Ca- 
mazine et al., 2001). It takes time for the fireflies to achieve 
complete synchronization. This is analogous to a phase cou¬ 
ple oscillator which adjusts its phase to match it to that of the 
faster one in the vicinity. This trait emerges into a pseudo 
synchronized blinking pattern while the frequency of blink¬ 
ing will be influenced by the fastest blinking insect. 

FSTaxis algorithm 

As per the objectives listed in the introduction, a novel gra¬ 
dient taxis algorithm is hereby presented, namely, the Fire¬ 
fly Slime mold Taxis(FSTaxis) algorithm. As its name sug¬ 
gests, this algorithm draws inspiration from biological sys¬ 
tems introduced in the Section ”Bio Inspiration”. The FS¬ 
Taxis algorithm makes use of the communication strategy 
of slime mold and the phase coupled oscillation aspect in 
fireflies. The behavior of agents in the FSTaxis algorithm 
can be broadly classified into Ping behavior and Motion be¬ 
havior. The following sections will explain the working of 
these behavior modes. The sequential flow of instructions of 
the FSTaxis algorithm can be found on the following page. 
Hereafter, a ”ping” is referred to the single bit communica¬ 
tion which each agent broadcasts. The agents are equipped 
with sensors to determine the direction of incoming pings 
and the environmental factor of interest value at its own lo¬ 
cation. 
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Figure 1: When the intrinsic cycle length of each agent counts out, 
a ping is broadcasted, surrounding agents captures the ping and 
relays it on. The blue circles show original agent whose internal 
clock triggered. The white agents in the figure relay the pings 


The pings can be perceived by other robots within a very 
limited sensor radius, s r . Also, the agents are able to move 
around in the environment with limited speed v a . 

Ping behavior 

Each agent has three communication states: ’’pinging”, ’’re¬ 
fractory” and ’’inactive” as shown in the state transition dia¬ 
gram Figure 2. By default all agents are set to inactive mode 
and each of them have an internal countdown timer whose 
initial value is associated with its position in the environ¬ 
mental gradient. In the inactive mode, the agent only checks 
for incoming pings. When an agent receives a ping, it broad¬ 
casts a ping for a period of time, say t p . During t p , the agent 
is said to be pinging and after t p , the agent enters the re¬ 
fractory mode. During refractory time, the agent ignores all 
incoming pings. After the refractory time t r , the agent sets 
itself back to inactive mode. The cycle continues if the agent 
receives another ping. 

Each agent has an inherent cycle time determined by the 
environmental gradient at its position. As shown in Figure 
1, if the internal timer of any agent counts to zero before a 
ping is received, the agents broadcast a ping and sets its own 
ping frequency, f p , by associating it with the gradient value 
at its position, g p . That ’’original” ping is further relayed by 
the neighboring agents as per the ping behavior explained 
above. The agent that triggers the original ping (the agent 
whose f p counted to zero) is referred to as the ’’leader” in 
the upcoming sections of this paper. 

In order to provide scaling of pinging frequencies to 
meaningful values, two preset maximum and minimum are 
selected for the gradient under consideration. Let these val¬ 
ues be Qmax and gw n . Equation 1 shows the relation be¬ 
tween ping frequency of agents and inherent cycle time of 
agents. The selection of constants, a and u, are dependent 


upon t p , t r and the boundary values of the gradient under 
consideration. Here, a and cc should be selected so that 
the agents continue pinging in the entire range of gradient 
values possible. For example, if t p is equal to f p for any 
agent, it will continuously ping without ever entering refrac¬ 
tory phase. Therefore, it is necessary that a and uo are ad¬ 
justed to scale ping frequencies to meaningful values. In this 
paper, a and cc are selected only to demonstrate the gradient 
ascent ability of the FSTaxis algorithm. Since it does not 
depend upon the type of gradient the values of the constants 
will be the same throughout this paper as shown in Table 1. 

X _ I {{dp — 9 min) /1 N 

fp = a+ - -—- T*CU (1) 

\9max Qmin) 

Motion behavior 

Motion behavior in the agents is overseen by ping behav¬ 
ior. An agent in inactive mode does not move. As shown 
in Figure 2, motion is initiated in the active mode. When 
any agent receives a ping it sets itself to active mode, sets its 
own heading towards the received ping and starts moving to 
cover a fixed distance, /3 at velocity v a . A ping can only be 
perceived within the limited sensor range, s r , of the robot, 
therefore it limits the number of agents that are able to affect 
any particular agent. In the scenario described above, it is 
possible that each agent receives multiple pings from differ¬ 
ent directions, h n , where n is the number of agents pinging. 
In such as case, the agent will calculate the mean heading, 
hmean, and set its heading towards this mean. 

If an agent’s internal clock triggers, an ’’original” ping 
based on the environmental value is broadcast; then, this 
agent labels itself the leader and does not move in that par¬ 
ticular cycle. 

When a swarm of agents execute the FSTaxis algorithm 
as per description above, scroll waves of pings similar to 
that in slime mold (as mentioned in section ’’Slime mold”) 
propagates through the swarm. Since the internal timer of 
the robot at highest gradient value will count to zero first, 
the direction of the wave will be from the higher to lower 
gradients. During their inactive cycles, the agents will move 
towards the mean direction of incoming pings. Since the 
’’leader” broadcasts the original ping and does not move, 
the agents will gather around the leader. When the agents 
are in their new position, their internal clock takes the val¬ 
ues of the environmental value (gradient value). Whichever 
agent’s internal clock triggers first becomes the leader and 
the swarm then gathers around this agent. This repeated 
choosing of leaders and gathering around the leader will 
draw the swarm towards areas with higher gradient value 
and in essence emerges into a gradient ascent. 

Method 

To demonstrate the gradient ascent capability, a linear gra¬ 
dient, a hyper ellipsoid gradient and noisy variants of these 
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Algorithm 1 The FSTaxis algorithm 

repeat 

procedure Ping behavior(£ p , t r ,v a9 tf) 
for all agents do 

if pingmode = refractory mode then 
count down t r 
if t r = 0 then 

set state <— inactive mode 
set tf <— 1/fp - equationl 
set leader status G- ’’OFF” 
end if 
end if 

if pingmode = active state then 
count down t p 

if t p = 0 then 

set state <— refractory mode 

end if 
end if 

if pingmode = inactive mode then 
if any ping received? then 
set state <— active mode 
set move agent G- ”ON” 

end if 
end if 

count down t / 
if tf = 0 then 

set state <— active mode 
set leader status <— ”ON” 

end if 
end for 

end procedure 

until forever 


repeat 

procedure MovEMENT(move agent, leader status) 
for all agents with movement = ”ON” and leader 
status ! = ”ON” do 

while distancecovered < /3 do 
Create empty list, l 
for i <— 1, no : ofpingsreceived do 
append list l hi 
end for 

calculate h mean of list, l 

set agent heading h a <- h mean 

move agent with fixed velocity v a 

end while 

set move agent <— ’’OFF” 

end for 

end procedure 

until forever 
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Figure 2: Figure shows the state transition diagram of the FSTaxis 
algorithm. The algorithm has two behaviors - ping behavior and 
motion behavior. In ping behavior, there are three states: active, 
refractory and inactive. An agent is in active state when it receives 
a ping from a surrounding agent or when its own internal clock 
triggers. After the ping duration, the agent transitions into a refrac¬ 
tory mode. After the refractory time, the agent transitions into the 
inactive mode. The active mode triggers motion behavior and the 
agent takes a preset distance towards the ping it received. 


gradients are used as test functions. Since depth is of rel¬ 
evance in project subCULTron, it is considered to be the 
physical quantity of interest in the paper. As mentioned be¬ 
fore, the frequencies are scaled according to the equation 1 
and Table 1 shows all the constants used in this experiment. 
The simulation environment used is Netlogo 4.3.1 (Wilen- 
sky, 1999). In Netlogo, the test area is divided into ’’patches” 
(spatial units) and the agents are called ’’turtles”. For the 
purpose of this experiment, depth is the physical quantity 
associated with each patch. The sensor radius of each of 
the agents are measured in patches and in this experiment 
it is taken to be 3 patches since it is a reasonable range for 
underwater communication. 


Constants 


i0 

tp 

t r 

Sj* 

13 

a 

Qmax 

Qmin 

Value 

0.1 

5 

5 

3 

1.5 

0.008 

50 

5 

Units 

- 

s 

s 

p 1 

p 1 

- 

m 

m 


Table 1: Table showing all constants used in the FSTaxis algorithm. 


Linear gradient 

This section aims to demonstrate the basic implicit capabil¬ 
ity of the FSTaxis algorithm to traverse gradient. The equa¬ 
tion for gradient is that of a line, as shown in equation 2 

f(x) = x (2) 

x unit p in Table 1 represents number of patches in netlogo 
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Figure 3: A linear gradient was presented to the FSTaxis algorithm. 
The grayscale shows the gradient value and the goal is the darkest 
area; the star symbol represents the starting point and the inverted 
triangle marks where the swarm converged. 

Figure 3 shows the simulation environment in Netlogo 
setup with a linear gradient. The color scaling represents the 
gradient value of the environment and therefore the goal of 
the gradient ascent algorithm will be to go towards the dark 
colored areas. The red line represents the trajectory of the 
mean position of the swarm from starting point, represented 
by the star, to convergence, represented by the inverted tri¬ 
angle. The swarm is said to converge when its mean posi¬ 
tion oscillates around the area with the maximum gradient 
value. The trajectory shown is the result of one of the exem¬ 
plary run from the 100 runs conducted with this test gradi¬ 
ent. 100% of the runs resulted in convergence to maximum 
gradient value. 

Hyper ellipsoid gradient 

This section presents an environment for testing the FSTaxis 
algorithm in a relatively more complex gradient, a three di¬ 
mensional axis parallel hyper ellipsoid. The axis parallel 
hyper ellipsoid, represented by its standard equation 3, is a 
convex, continuous function and multiple modal function. 

2 

/(*) = S x?, where — 5.12 < X{ < 5.12 (3) 

i= 1 

For the hyper ellipsoid gradient, there are four goals at 
the corners of the arena with the highest gradient value. The 
area of the goal (corners of the ellipsoid) is merely 0.23% of 
the total area of the arena. Therefore, random chances of the 
swarm converging to the goal is minimal. Figure 4 shows 
FSTaxis algorithm tested with a hyper ellipsoid gradient, the 
thick red line shows the trajectory of the swarm. The black 
star marks the starting point and the inverted triangle shows 
the area of convergence. The trajectory shown is the result of 
one of the exemplary run from the 100 runs conducted with 
this test gradient. 100% of the runs resulted in convergence 
to maximum gradient value. 


Figure 4: An axis parallel hyper ellipsoid gradient was presented 
to a swarm executing FSTaxis algorithm. The grayscale shows the 
gradient value and the goal is the darkest area; the star symbol 
represents the starting point and the inverted triangle marks where 
the swarm converged. 



Figure 5: A linear gradient with numerous local maxima was pre¬ 
sented to the FSTaxis algorithm. The grayscale shows the gradient 
value and the goal is the darkest area; the star symbol represents 
the starting point and the inverted triangle marks where the swarm 
converged. 

Noisy Gradients 

In order to test the ability of the FSTaxis algorithm to over¬ 
come small local optima, 20 randomly generated obstruc¬ 
tions or ’’hills” have been introduced to the smooth gradient. 
These obstacles attracts them to stay at these local optima 
if sufficient exploration is not introduced. Figure 6 and 5 
shows the result of a random successful attempt out of the 
10,000 iterations of FSTaxis algorithm run with noisy hy¬ 
per ellipsoid gradient and linear gradient respectively. The 
experiments with noisy gradients were tested with varying 
steepness of local optima and spread of each optima. The 
number of obstructions were kept constant at 20. For each 
obstruction spread ranging from 1 to 10 and steepness rang¬ 
ing from 0.25 to 2.5 times the normal gradient, 100 iterations 
were run to observe the convergence to the goal. 
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Figure 6: A axis parallel hyper ellipsoid gradient with numerous lo¬ 
cal maxima was presented to the FSTaxis algorithm. The grayscale 
shows the gradient value and the goal is the darkest area; the star 
symbol represents the starting point and the inverted triangle marks 
where the swarm converged. 

Results 

The FSTaxis algorithm, as described in the section ’’Meth¬ 
ods”, successfully traverses the linear gradient as well as the 
hyper ellipsoid gradient in all 100 iterations conducted. Fig¬ 
ures 3 and 4 show clearly the ability of the algorithm for 
gradient ascent. 

Performance in noisy gradients 

When the agents executing FSTaxis algorithm (see subsec¬ 
tion ’’Noisy Gradients”) are presented with obstructions in 
the gradient, they are able to overcome local optima intro¬ 
duced. It can be seen that the agents are eager to climb gra¬ 
dients as seen in 5 and 6. As individual agents move towards 
the leader, they overshoot the leader (agent whose internal 
clock triggered a ping) and cross out of the hill to escape 
the local optima. Figure 7 is a graph relating between the 
size and steepness of local optima (number of patches) to 
the percentage runs that converged to the goal. It is seen that 
when size of local optima is under three patches (for 20 ob¬ 
stacles in the arena) 100% of runs converge to the goal. As 
size and spread of local optima rises, the rate of convergence 
decreases. 

Performance with multiple gradients 

Figure 8 shows the region of convergence relative to the 
starting point of the swarm for 100 iterations. The X-axis 
shows the quadrant in the arena where the swarm started 
and the Y-axis shows the region of convergence. Numbers 
1, 2, 3, 4 refer to the quadrants as referred to in the cartesian 
coordinate system. It is seen that in 95% of the runs, the 
swarm converges to the goal nearest to it. The 5 % error is 
attributed to the fact that, when the swarm starts at the mid¬ 
point between two gradients, it has to choose which gradient 



sft«pnfH3i of liill= V hm« the nPrmat gradient 

Figure 7: The color map shows the percentage convergence in 
presence of noise. Each gradient was introducted with 20 ob¬ 
structions or local optima. The X-axis represents the steepness 
of each of these obstructions with respect to the normal gradient. 
The Y-axis represents the area covered by each of these 20 ob- 
structions(measured in patches). Each patch is 0.05% of the entire 
arena. For each set of hillsize and steepness, 100 iterations were 
run. White coloured areas show 100% convergence and shades to¬ 
wards red and black coloured areas shows lower convergence as 
per color map provided. 

to ascent. This decision is taken randomly since it depends 
on which agent’s internal clock triggers first. 

Discussion 

In contrast to many classical solutions of gradient ascent, the 
FSTaxis algorithm is efficient, simple and requires only sim¬ 
ple hardware. Since agents do not actively compare its cur¬ 
rent gradient value and the previous gradient value, gradient 
ascent is purely an emergent trait. The FSTaxis algorithm 
uses no evaluation function and acts purely based on local 
knowledge. Any agent has to be merely informed about the 
presence of other agents in its sensor radius. Therefore, the 
impositions on the agent is to sense the gradient value at 
current position, adjusting the agent’s own ping behavior ac¬ 
cordingly and broadcast a 1-bit communication to make its 
presence known. These qualities make FSTaxis algorithm a 
choice solution for real world gradient taxis when resources 
are sparse. Many gradient taxis solutions have been pro¬ 
posed in the past for single agents. Some examples include 
the standard hill climber (Davis, 1991), helical klinotaxis 
(Long et al., 2004) etc. While these solutions work well 
for individual robots, they are not designed for a group of 
robots. There are a few swarm algorithms for emergent gra¬ 
dient taxis that has been proposed in the past like swarm- 
taxis(Bjerknes et al., 2007), Artificial Homeostatic Hormone 
System (Schmickl et al., 2010) etc. The swarmtaxis algo¬ 
rithm is an emergent gradient taxis solution like the FSTaxis 
algorithm and is based on a single ping broadcast commu¬ 
nication. Therefore, it is of merit to compare the FSTaxis 
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Figure 8: Bubble plot showing the region of convergence relative to 
the starting point of the swarm for 100 iterations. The reference line 
at the right lower corner shows the diameter of a 25 run bubble. The 
X-axis shows the quadrant in the arena where the swarm started 
and the Y-axis shows the region of convergence. Numbers 1, 2, 3, 
4 refer to the quadrants as referred to in the cartesian coordinate 
system. It is seen that in 95% of the runs, the swarm converges to 
the goal nearest to it. 

algorithm with the swarmtaxis algorithm. 

Bjerknes et al. (2007) presented the ’’swarmtaxis” algo¬ 
rithm which is also an emergent solution for gradient taxis. 
It is worthwhile to mention swarmtaxis as the FSTaxis al¬ 
gorithm drew inspiration from this work. The swarmtaxis 
algorithm works by creating a frontier of agents which are 
facing the source (light) terming them ’’illuminated”. The 
illuminated agents cast a shadow on agents behind them 
terming them as ’’shadowed” agents. The swarmtaxis algo¬ 
rithm works based on the illuminated robots having a higher 
sensing distance than the shadowed robots and hence, they 
move away from the shadowed robots. In essence, they 
move towards the light source. 

The swarmtaxis algorithm guarantees that the swarm will 
converge to the source. It is a stable way to ascent the light 
gradient however it is not suitable for use in swarm robotics 
as it imposes various limitations on the kind of gradient it 
can ascent. For example, the swarmtaxis algorithm assumes 
that each robot has the ability to physically occlude another 
from the source. This assumption holds well when light 
source is at the same level as the agents and not otherwise. 
FSTaxis algorithm overcomes this limitation by being de¬ 
pendent upon the local gradient value. Therefore, a need for 
occlusion never arises. 

If there are two light sources, the swarmtaxis algorithm 
will create two frontiers and will not consider the brighter of 
the sources to move to. Although the swarmtaxis algorithm 
makes it redundant to have a local gradient value sensor, it is 
at the cost of not being able to handle multiple sources. The 
FSTaxis algorithm, on the other hand, is able to handle mul¬ 
tiple sources or maxima at the cost of using a local gradient 
value sensor. 


As per the objectives of the paper, boundary conditions 
for FSTaxis algorithm were investigated. It has been shown 
that the FSTaxis algorithm is able to work with multiple lo¬ 
cal optima. As seen in Figure 7, in presence of local op¬ 
tima that are steep enough, the FSTaxis algorithm is likely 
to get stuck in the local maxima. For problems such as multi¬ 
modal optimization, for example solving a rotated hyper el¬ 
lipse, the FSTaxis algorithm does not guarantee convergence 
to the global maxima. 

Conclusion 

From this paper, it is demonstrated that FSTaxis algorithm 
is a feasible solution for gradient ascent in swarm robotics. 
It is especially attractive because it requires only a single bit 
communication between the agents. 

As discussed previously, the FSTaxis algorithm does not 
guarantee a solution for multi-modal gradients. In the future, 
extensions of this algorithm can be formulated for use in 
multi-modal optimization. 

The FSTaxis algorithm works based on sharing of spatial 
gradient related information via frequency of pings. This pa¬ 
per is a successful demonstration of information exchange 
without explicitly sending data. In the future, more re¬ 
sources can be dedicated towards how more information re¬ 
garding gradients can be shared and how the agents can use 
this to change their behavior. 

For relating the current position with its own ping fre¬ 
quency frequency, agents executing the FSTaxis algorithm 
scales the gradient value as per equation 1. This provides 
a way to tweak the equation depending on the gradient that 
is of interest. Moving forward, it would be useful to have a 
single equation that can be used globally without a need to 
manually scale the parameters. 
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Abstract 

This paper describes the way in which we have employed 
agent-based models to understand fission-fusion dynamics 
(FFD), a collective pattern of behavior in many social animals. 
Groups with a high degree of FFD split into subgroups that 
vary in size, cohesion and composition, often within short 
temporal scales. These dynamics are thought to be more 
complex than those of other species with cohesive, stable 
groups, leading to hypotheses about the origin of social 
intelligence. Also, a flexible grouping pattern is supposed to be 
an adaptive solution to the temporal and spatial variation in 
feeding resources. We have used models where relatively 
simple agents forage in realistic, heterogenous environments 
and have shown that, for intermediate levels of heterogeneity in 
the size of food patches, agents form subgroups that vary in size 
and composition in a similar fashion as they do in species with 
a high degree of FFD. We have also explored the idea that by 
splitting in subgroups that vary in size, animals can exploit a 
heterogeneous environment with ephemeral food sources more 
efficiently than cohesive groups. Agent-based models have 
provided ways to test hypotheses and develop predictions about 
social and ecological dynamics. 

Fission-fusion dynamics (FFD) is a property of many groups 
of animals that split in subgroups of variable size, 
composition and degree of cohesion (Kummer, 1971; Aureli, 
et. al. 2008). Species that show this property, to different 
degree, include, among mammals: baboons ( Papio spp), 
chimpanzees (Pan spp), spider monkeys (Ateles spp), African 
elephants (Loxodonta africana ), hyenas (Crocuta crocuta ), 
dolphins ( Tursiops truncatus ), bats (Myotis bechsteinii) and 
several species of birds (Aureli, et al. 2008; Silk, et al. 2014). 

A proposed adaptive function of FFD is that it allows for an 
efficient exploitation of resources that are distributed 
heterogenously in time and space, adjusting the size of the 
foraging units to the local density of resources (Kummer, 
1971; Aureli, et.al. 2008). While there have been some partial 
tests of this idea, the complexity of environmental variables 
and the different ways in which a particular behavior could be 
adaptive have provided contradictory results (Chapman, et al. 
1995; Newton-Fisher, et. al. 2000; Symington 1988). 

In terms of the mechanistic basis of FFD at the level of 
individual behavior, it has been proposed that, because species 
with FFD confront a greater diversity of social situations than 
species with cohesive groups, they should be subject to 
selection for higher cognitive abilities. These would underlie 
processes of information sharing or withholding and special 
social interactions that allow for individuals to cope with the 


constant fissioning and fusioning of subgroups (Aureli, et. al. 
2008). 

We have used agent-based models to complement field 
observations of the social behavior of spider monkeys (Ateles 
geoffroyi; Ramos-Fernandez, et. al. 2003) in order to 
understand their movement and grouping patterns. We 
extended an agent-based model initially aimed at 
understanding the movement trajectories of a single forager in 
heterogenous environments (Boyer, et al. 2006), by 
incorporating several foragers (Ramos-Fernandez et al. 2006). 
Our goal was to understand the minimum conditions that 
would give rise to a fission-fusion grouping pattern among 
foragers. We set up an environment where discrete food 
patches vary in size according to an inverse power-law (as real 
trees do in many tropical and temperate forests: Enquist and 
Niklas, 2001) with some large patches and many small 
patches. Here, a set of foragers moves according to a local 
optimality rule, maximizing the size of the next visited patch 
and minimizing the distance traveled to it. In each iteration of 
the simulation, a forager takes a step or reduces the food 
content of a patch by one unit. In addition, foragers do not 
come back to a previously visited patch. 

Even though the model does not specify any interaction 
among foragers, it does have an implicit fission-fusion 
mechanism: when two foragers coincide in the same patch, 
they form a temporary aggregation that can continue as they 
forage together in other patches, whereas they can also split 
due to their previous history of visits. Figure 1 shows a 
summary of the different situations observed in the model: for 
certain values of the parameter that controls patch size 
heterogeneity, the foraging trajectories and the size of the 
temporary aggregations are similar to those described in field 
studies of spider monkeys (Symington 1988; Ramos- 
Fernandez & Ayala-Orozco 2003). Particularly, intermediate 
values of patch size heterogeneity led to the longest foraging 
trajectories and the largest aggregations. This is because 
foragers traveled long distances to reach large patches that 
were neither rare nor scarce, coinciding with others more 
often at these large patches (Ramos-Fernandez et al. 2006). 

These results show that an important ecological influence 
on FFD could be the relative abundance of patches of different 
size, in contrast with the usual measures of overall food 
abundance or average patch size. This can be taken as a 
prediction for field studies in which the size of visited patches 
can be measured. 

We then developed a simpler version of this model, aimed 
at testing the idea that groups with FFD could be more 
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Figure 1. Movement trajectories described by foragers in the agent-based model by Ramos-Fernandez et al. (2006). Each line (in different color) 
represents a different forager. Starting from randomly assigned positions, foragers move to the nearest and largest food patch available (points not 
visible in the figure). The left panel corresponds to a situation with maximum heterogeneity in patch size and thus a comparatively large 
proportion of large patches. In this situation, foragers find a large patch very close and the simulation “freezes” with very little interaction between 
foragers. On the contrary, the panel on the right represents a situation with minimum heterogeneity in patch size, and thus very few large patches. 
Here, foragers simply visit the nearest patch, describing long trajectories with many changes in direction. They may coincide with others but they 
mostly forage locally. The situation in the middle panel represents an intermediate level of heterogeneity in patch size, with some large patches 
that are often worth visiting even when they are far. The trajectories described by foragers are longer, with a combination of small and long 
“steps” between patches. This is the situation when foragers formed the largest subgroups with others with whom they coincided in these large 
patches. 


efficient than cohesive groups at exploiting ephemeral and 
unpredictable patches, such as tropical trees, which have short 
fructification periods, each species with fruit at different times 
of year (Rathcke and Lacey, 1985). In this environment, food 
patches have a randomly assigned amount of food, which is 
present for a randomly chosen period of time. Foragers search 
for food using a correlated random walk, and they can be 
either cohesive (all agents move together) or separate (all 
agents move independently, as subgroups of spider monkeys 
do: Ramos-Fernandez et al. 2011). The foraging efficiency is 
calculated as the number of food units obtained over the 
distance traveled by each forager. In order to control for the 
effect of group size on the foraging efficiency, the same 
number of foragers are present in both conditions. Preliminary 
results show that the efficiency of separate foragers is in fact 
greater than that of the cohesive foragers. 

We have successfully used agent-based models to explore 
the minimum, simplest conditions that could produce a 
collective pattern out of local interactions between agents and 
their environment. Also, these models have served to develop 
predictions to be tested with further fieldwork and to test 
hypotheses about the adaptive function of FFD. 
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Why do organisms cooperate with each other? This seemingly 
simple question has motivated a staggering amount of 
theoretical and experimental research. When cooperating with 
others carries a direct fitness cost for the individual, natural 
selection should act against such behavior. However, 
cooperation is widespread across natural systems, from birds 
and bees to bacteria. Past research has identified many factors 
favoring or disfavoring the evolution of cooperation. For 
example, we know that properties of public good molecules 
(Misevic et al. 2012) affect the evolution of cooperation. Here 
we summarize the results and provide additional discussion 
about the implications of our recently published work on the 
importance a previously overlooked factor, the population 
shape (Misevic et al. 2015). 

In the past, we have studied different aspects of 
cooperation using a well established in silico system, Aevol 
(Frenoy et al. 2013) In Aevol, digital organisms with double- 
stranded binary genomes and complex genetic architecture 
mutate, compete, and evolve over thousands of generations. 
The ancestral organism is not a cooperator, but populations 
may evolve to secrete different amounts of public good. The 
secretor pays a cost proportional to the amount of public good 
it produces, and all neighboring individuals benefit equally 
form the secreted molecules. The secreted molecules diffuse 
to neighboring cells and degrade over time. Generations are 
synchronous, with nine individuals in the classical Moore 
neighborhood competing to populate the next generation. 
For the purpose of the study on the population shape, we have 
also introduced two new, simpler systems, Aevol-lite and 
CAevol. Aevol-lite has all the properties and simulation 
mechanics of Aevol, but instead of binary strings and non¬ 
trivial genotype to phenotype to fitness mapping, each Aevol- 
lite individual is represented by a single number, a binary 
gene identifying whether it secretes a public good or not. 
CAevol is a further simplification, a system without public 
good, in which individuals with pure cooperate/defect 
strategies play a classical Prisoner’s dilemma. 

Aevol individuals live on a quadrilateral grid with periodic 
boundary. All the locations on the grid are always full. We 
studied two different populations shapes, a bulky 100x100 
torus (akin to a fat doughnut) or a slender 4x2500 one (akin to 
a slender bicycle tire). It is interesting to note that we noticed 
the effect by accident, while studying a different population 
property and inadvertently modifying the shape as well. We 
corrected the mistake in the code, but the result remained: 
more secretion evolved in bulky than in slender populations 
(Figure 1A). This evolutionary outcome was highly robust and 


not affected by changes in any of the other cooperation 
parameters, such as cost and benefit of secretion, or the 
diffusion and degradation rates. 

The result was not intuitive and did not lend itself to an 
obvious explanation. We expected slender populations to 
facilitate separation between secretor and non-secretors, 
leading to fewer interactions, thus selecting for cooperation. 
In order to explain the prevalence of cooperation in 
populations of different shape, we closely examined the 
dynamics of cooperating patches, sub-populations. 
Additionally, we decided to simplify our model system, 
moving from Aevol onto first Aevol-lite and finally CAevol. 
This allowed us to rule out system idiosyncrasies, or plain 
bugs in the code, and test the generality of the results. Indeed, 
bulky populations evolved more cooperation no matter what 
particular simulation system or mode of cooperation (public 
good v. Prisoners’ dilemma) we used (Figure 1). 

Studying cooperation in Aevol-lite and CAevol, where 
there are only two types of individuals, made it much easier to 
visualize the population over time. We suspected that 
populations are not clonal, consisting of identical individuals, 
but instead a diverse assembly of cooperators and non¬ 
cooperators in a dynamic equilibrium. Indeed, by plotting 
Aevol-Lite populations we saw exactly that: populations are a 
collection of expanding and shrinking patches of cooperators 
and non-cooperators, constantly taking over one another. And 
it is in the dynamics of these patches that we found the answer 
for the population shape effect. 



100x100 4x2500 100x100 4x2500 100x100 4x2500 


population shape population shape population shape 

Figure 1. Average cooperation in bulky (100x100) and 
slender (4x2500) populations in (A) Aevol, (B) Aevol-lite, 
and (C) CAevol (from Misevic et al. 2015). Cooperation 
after 50,000 generations is quantified as the average amount 
of the public good secreted (A) or the percentage of 
cooperators (B and C). Line marks the median, the box edges 
are the 25th (ql) and the 75th (q3) percentile, the whiskers the 
most extreme data points still smaller than q3 + w(q3 - ql) 
and larger than ql - w(q3 - q3), where w = 1.5. 
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In slender populations, the population shape constrained 
the expansion of cooperator patches. This can clearly be seen 
in Figure 2B, zooming in on a part of a single population over 
time, where after the first few generations, the cooperator 
patch (shown in gray) can increase only by eight individuals, 
four on the left and four on the right. In contrast, in the bulky 
population, such a patch can expand on all four sides, 
potentially by many more individuals in each generation. In 
Figure 2A, which focuses on a section of a bulky population 
over time, we exactly see such fast expansion. In both cases, 
the patches are eventually invaded from within by non¬ 
cooperators, which arise as mutants. However, before the 
cooperator patch gets completely overrun, it has a chance to 
expand to a greater size in bulky than in slender population. 
And that is precisely the reason for more cooperation in bulky 
than in slender populations: in all cases cooperator patches 
arise continuously, only to be taken over by no-cooperators, 
but in bulky populations they grow bigger, resulting in more 
cooperators present in the population at any given point in 
time. We quantified this difference by exhaustively simulating 
a single cooperator patch from its inception, through its 
ultimate demise. The results confirmed our verbal analysis of 
patch dynamics: no matter what the rate of mutation from 
cooperator to non-cooperators, over the entire lifetime of a 
patch, more cooperators existed in the bulky than slender 
population. This patch analysis was done in Aevol-lite, and 
was further confirmed in Aevol. By using Markov Cluster 
Algorithm we identified and measured the number and size of 
clusters formed by individuals based on the amount of public 
good secretion. We found that those clusters were smaller and 
more numerous in slender that in bulky populations, exactly in 
line with what we saw in Aevol-lite. 

After hundreds of populations, millions of generations, 
and billions of individuals, we confirmed that shape does 
matter for cooperation and were able to explain the effect 
through the analysis of within-population patch dynamics. But 
what does our result mean for simulations of cooperation or 
for the study of evolution of cooperation in general? After all, 
population shape, as we defined it here, seems to be a rather 
peculiar parameter. However, it is not as obscure as it seems 
to be, since we can certainly think of cooperating populations 
living in complex 3D structures (soil, human lung), 2D plans 
(petri dish), or even effectively ID (filamentous 
cyanobacteria). While we have not extended the simulation to 
different dimensions, our analysis indicates that the results 
would hold: more cooperation in populations of higher 
dimensions. Moreover, for some time there has been a push to 
consider different, more complex population structures, 
namely graphs (Ohtsuki et al. 2006). Our results make a 
strong argument that such treatment is useful and potentially 
necessary when considering the evolution and maintenance of 
cooperation, in silico or in vivo. The conclusions from the 
abstract bulky and slender populations directly extend to 
populations in which individuals have different number and 
strength of cooperative interactions, suggesting that graphs 
with higher connectivity will promote cooperation. 

Finally, the heterogeneous and complex map of 
interactions between actors has already been studied in 
epidemiology (Salathe et al. 2010). A recent study using a 
microbial system established a connection between 
cooperation and information transfer (Dimitriu et al. 2014), 


which allows us to make a connection between two fields 
here. In cooperation as well as epidemiology, all properties 
relating to the interactions between individuals, including the 
population shape and structure, should be considered because 
they may constrain and alter the maintenance and spread of 
(potentially infections) cooperative trait. 




Figure 2. Example of a full lifecycle of a cooperator patch 
in (A) a bulky and (B) a slender Aevol-lite population 

(adapted from (Misevic et al. 2015)). Each square is either a 
cooperator (dark) or a non-cooperator (white cell). Each 
square in panel (A) and rectangle in panel (B) represents a 
snapshot of a population region from a single generation, with 
generations increasing from left to right, top to bottom. 
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Abstract 

While it has been observed (Hornby et al., 2001) that devel¬ 
opmental encodings in evolved systems may promote mod¬ 
ularity, there has been little quantitative study of this phe¬ 
nomenon. There has also been little study of the factors driv¬ 
ing the emergence of hierarchical modularity - modularity on 
multiple levels, in which the modules found at a finer-grained 
level can serve as elements in a coarser-grained network that 
is also modular - despite the fact that most fields with an inter¬ 
est in modularity, including biology and engineering, define 
hierarchy as an important aspect of modularity. We exam¬ 
ine the effect of developmental encodings on the emergence 
of multiple levels of modularity through the lens of two de¬ 
velopmental systems, GRNEAT and GENRE, and find evi¬ 
dence that developmental encodings promote this emergence 
of modular hierarchy. 

Introduction 

Below, we examine interactions between development and 
hierarchical modularity in artificial systems. Modularity, the 
organization of a system into a hierarchical system of inter¬ 
acting subparts, is observed in many systems both natural 
and engineered (Koza, 1992; Simon, 1996; Hartwell et al., 
1999), and has become important as evolutionary systems 
are used in increasingly complex applications. Simulations 
of development, the process by which a mature phenotype 
is constructed from an organism’s genetic code, have been 
used in computational studies both in conjunction with and 
distinct from simulations of evolution. We briefly discuss 
modularity in evolution, followed by an overview of artifi¬ 
cial development. 

Modularity 

Biological systems, including biological networks such as 
neural networks and bacterial metabolic networks, and other 
kinds of biological systems such as tissues (which are as¬ 
sembled from cells), tend to be modular. The definition of 
modularity is somewhat vague - though generally referring 
to the degree to which a system is composed of separable, 
recombinable components - and can be used differently in 
different fields and subfields. Bolker (2000) attempted to 


define a list of characteristics of modularity that would be 
appropriate across different subfields and levels of study in 
biology, including greater internal integration of modules 
as compared to external integration, the ability to delineate 
modules from their surroundings, and module performance 
that is greater than the sum of its parts. Schilling (2002) 
found that a variety of fields, including technology, psychol¬ 
ogy, biology, American studies, and mathematics, define hi¬ 
erarchical nesting as an aspect of modularity. It is worth 
noting that the hierarchical aspect of modularity, the emer¬ 
gence of which we explore in this paper, has not traditionally 
been examined in simulated evolution studies, despite its im¬ 
portance in how most fields define modularity. We chose to 
focus on hierarchy because of this gap in the literature, and 
because development is such a key factor in the formation 
of many hierarchically modular biological systems, such as 
organisms. 

Evolutionary algorithms tend to produce nonmodular so¬ 
lutions - though there are some exceptions, as in the coevo¬ 
lutionary algorithm of Juille and Pollack (1996), which used 
genetic programming to produce modular solutions to the in¬ 
tertwined spirals problem. These nonmodular solutions are 
often connected in complicated ways and perform better on 
the task for which they are optimized than the more modular 
solutions designed by human designers (Thompson, 2012) 
(Vassilev et al., 2000). However, while this tendency against 
modularity can produce well-performing solutions for sim¬ 
ple problems, it makes it difficult for evolved systems to 
solve complex problems (Kashtan and Alon, 2005). While 
this issue can be addressed by building the encapsulation 
of modules into algorithms, this does not illuminate how 
modularity evolves in nature, and it means possibly miss¬ 
ing out on some design benefit that comes with modular¬ 
ity emerging rather than being hard-coded. In addition, al¬ 
lowing modularity to emerge through an iterative process 
may allow for nonmodular, high-performing species of solu¬ 
tions to develop modularity over time while preserving their 
strong performance. 

In recent years, there have been several studies examin¬ 
ing the emergence of modularity in both natural and sim- 
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ulated evolution. (Lipson et al., 2002) found, in a study 
of minimal substrate modularization, that modular separa¬ 
tion is logarithmically proportional to rates of environmen¬ 
tal variation, and suggested using variable rather than fixed 
fitness criteria for the evolutionary design of engineered sys¬ 
tems. This hypothesis was supported by the work of (Kash- 
tan and Alon, 2005) in computational evolution studies, and 
(Kashtan et al., 2007; Parter et al., 2007) in natural evolution 
studies, which found that modularity evolves in response to 
varying environments (called modularly varying goals) in 
which individuals perform varying tasks that are decompos¬ 
able into common subtasks. The requirement that subtasks 
be performed in sequence, as a chain, has also been found to 
promote the evolution of modularity (Calcott, 2014). An¬ 
other possible explanation for modularity’s evolution was 
proposed by (Clune et al., 2013), which suggested that mod¬ 
ular networks evolve in response to a small decrease in fit¬ 
ness for each connection in the network - a connection cost 
- representing the energy cost of forming a link in a phys¬ 
ical network. A similar energy cost imposed on the NEAT 
neuroevolution algorithm, on a problem in which some so¬ 
lutions that evolve are modular, has been found to increase 
consistency in modularity emergence (Lowell and Pollack, 
2014). 

Artificial Development 

Artificial development, also known as artificial embryology, 
is an area of artificial life that models biological processes 
of development, in which there are layers of abstraction be¬ 
tween a genotype and a phenotype. The phenotype begins 
with a seed or embryo and progresses toward maturity ac¬ 
cording to a set of rules or interactions. The individuals 
on which evolutionary or other forces are acting are these 
processes by which the embryo develops. Developmental 
systems and other forms of indirect encodings of solutions 
can be contrasted with direct encodings, in which each com¬ 
ponent of the phenotype is made explicit in the genotype. 
As the problems being solved by evolutionary computation 
have grown in complexity, scalability has become an im¬ 
portant aspect of the design of new evolutionary compu¬ 
tation techniques, and certain properties of developmental 
systems, such as compact genotypes, lend themselves well 
to scalability (Bentley and Kumar, 1999; Hornby and Pol¬ 
lack, 2001a), which motivated much early research on artifi¬ 
cial development (Tufte, 2008). A system that uses artificial 
development may be called a generative or developmental 
system. 

Often, these developmental encodings are based on bi¬ 
ological developmental processes and principles. Dour- 
sat (2009) used lower-level developmental processes such 
as cell division/differentiation and morphogen gradients to 
create a self-patterning “organic canvas,” and (Miller and 
Banzhaf, 2003) created a model for the programming of 
a cell, using cell division and simulated chemical environ- 



Figure 1: Visualization of an example brick “table” structure 
produced by GENRE. 


ments, that was able to recreate a French flag and other 
patterns. Other approaches have involved the use of sim¬ 
ulated gene regulatory networks (Guo et al., 2009), the ex¬ 
ploitation of biological principles of degeneracy (Whitacre 
et al., 2010), and the evolution of grammars to generate pro¬ 
grams or expressions in a given language (O’Neill and Ryan, 
2001). 

Below, we provide a brief overview of the two different 
generative systems used in this study. 

GENRE 

GENRE (Hornby and Pollack, 2001b) is a developmen¬ 
tal system that was designed to create more complex vir¬ 
tual creatures than had been created using earlier artificial 
life techniques. It took a grammatical approach, evolving 
Lindenmayer systems (L-systems), (Lindenmayer, 1968), 
parallel grammatical rewriting rules originally designed to 
model plant growth, that took in parameters and would 
generate creatures with hundreds of components. The L- 
systems were applied iteratively to rewrite strings of com¬ 
mands through which to construct creatures or other struc¬ 
tures, such that complex strings were constructed from sim¬ 
ple ones. Hornby analogized the parallel nature of the rules, 
and the repetitive structures that they tended to produce, to 
concurrent cell division. The system outperformed a non- 
generative system on performance, creature size, and natural 
look, when applied to the design of mobile robots and block- 
based “table” structures. Hornby et al. 2001 observed that 
the generated robots appeared to exhibit modular properties, 
but did not attempt to quantify this. An example GENRE- 
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Figure 2: Visualization of an example artificial gene regula¬ 
tory network (GRN) produced by GRNEAT. 


produced structure can be seen in Fig.l. 

GRNEAT 

GRNEAT (Cussat-Blanc et al., 2015) evolves artificial gene 
regulatory networks, or GRNs (Banzhaf, 2003), which are 
simplified models of the genetic regulatory networks seen 
in biological systems, and used to control various kinds of 
agents. The GRNEAT algorithm evolves lists of proteins, 
which are then developed into network models with ma¬ 
trices of enhancing and inhibiting weights between nodes, 
mimicking the developmental module function of biologi¬ 
cal GRNs. The protein lists are used to initialize a GRN, 
which then updates its weights by calculating interactions 
between the proteins. It is based on NEAT (Stanley and 
Miikkulainen, 2002), a well-known algorithm for evolving 
neural networks, and retains NEAT’s major distinguishing 
features: initialization with small networks, a crossover op¬ 
erator that preserves subnetworks during GRN recombina¬ 
tion, and the use of speciation to give growth opportunity 
to potentially promising innovations. However, a key differ¬ 
ence is that artificial GRNs are inherently a developmental 
encoding, as biological gene regulatory networks are, while 
NEAT is a direct encoding algorithm. An example GRNEAT 
network, visualized in Gephi (Bastian et al., 2009), can be 
seen in Fig.2 

Methods 

To test the effects of development on hierarchical modular¬ 
ity, we used the GENRE algorithm, which uses L-systems 
to model parallel cell division, and the GRNEAT algorithm, 
which evolves protein lists for construction of artificial gene 
regulatory networks in a manner mimicking NEAT’s neu¬ 
roevolution methods, both of which are described briefly 


Figure 3: Illustration of two intertwining spirals, which must 
be distinguished from each other in the intertwining spirals 
problem. 


above. We chose to compare GRNEAT to NEAT because, 
as stated above, GRNs are developmental by nature, so we 
could not simply compare a developmental GRN encoding 
to a nondevelopmental one. We ran GENRE on the brick- 
table-building problem that was one of its original test prob¬ 
lems in (Hornby et al., 2001), which rewards individuals for 
minimizing the number of bricks and maximixing height, 
surface area, volume, and stability, and compared the results 
to those obtained by a non-developmental evolutionary al¬ 
gorithm that is packaged with GENRE for the purpose of 
running comparisons. We ran the GRNEAT evolutionary 
process on the problem of distinguishing two intertwined 
spirals, also called the intertwined spirals problem (Lang, 
1988), which is illustrated in Fig.3, with fitness being mea¬ 
sured by error ranging from -1 to 0, and compared the results 
to those obtained by running both feedforward and recurrent 
versions of the NEAT4J open source Java implementation of 
NEAT (Simmerson, 2006) on the same problem. 

Key parameters for GENRE and for GRNEAT/NEAT are 
listed in Table 1 and Table 2 respectively. While the com¬ 
parisons between GRNEAT and NEAT were done primar¬ 
ily using simulations of 250 generations, we also did a set 
of runs of GRNEAT that were only 10 generations, to see 
whether any modularity that existed was actually emerging 
over time or was present early in the simulation. We did 
not do 10-generation runs for NEAT because it had made al¬ 
most no progress at solving the intertwining spirals problem 
after only 10 generations. In the NEAT4J implementation 
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of NEAT, there is an option to allow or disallow recurrent 
neural networks. We decided to allow recurrency for a more 
even comparison, as GRNEAT produces recurrent networks. 
To prevent either the GRNs or neural networks simply mem¬ 
orizing a sequence of outputs rather than learning a mapping 
from coordinates to spiral ID, in both GRNEAT and NEAT, 
we used a fresh copy of the pre-initialized network for each 
new input. 


Problem Version 

Trials 

Generations 

Num R, P, C 

GENRE 

10 

100 

10, 2,2 

Nongenerative 

10 

100 

1, NA, NA 


Table 1: Key parameters in GENRE experiments. R is the 
number of production rules, P is the number of parameters 
per rule, C is the number of condition-successor pairs per 
rule. 


Problem Version 

Trials 

Generations 

pC, pM, PopSize 

GRNEAT 

20 

10 

0.25, 0.75, 500 

GRNEAT 

20 

250 

0.25, 0.75, 500 

NEAT 

20 

250 

0.25, 0.75, 500 


rithm is: 



where L is the number of edges, K is the number of mod¬ 
ules, d s is the sum of degrees of nodes in module s , and l s 
is the number of edges in that module. 

This method is very useful for examining a single layer 
of modularity in binary networks (i.e. networks where there 
is either a connection between two nodes or there is not). 
However, the weights of links between nodes in GRNs can 
vary by several orders of magnitude. Both GRNs and recur¬ 
rent neural networks may benefit from a modularity metric 
that can account for directedness. And the Newman-Girvan 
approach only looks for one layer of modularity, rather than 
for hierarchical modularity. Accordingly, we used the “Lou¬ 
vain method,” which was designed for speed, maximization 
of community detection, and the detection of hierarchical 
levels of modularity, to determine Q (Blondel et al., 2008). 
In the Louvain method, each node in the network is initially 
assigned to its own module, and the modularity Q is cal¬ 
culated according to the following equation for a weighted 
graph: 


Table 2: Key parameters in GRNEAT experiments. pC is 
probability of crossover, pM is probability of mutation, Pop- 
Size is Population Size. 


Q = 


Yy tj 

2m ^ 



kikj 

2m 


8 (a, Cj ) 


( 2 ) 


In order to look at the quantitative modularity of 
GENRE’S and its non-developmental counterpart’s brick ta¬ 
ble structures, we needed to represent the structures as net¬ 
works. In order to do that, we defined each brick as a node, 
and each case of a face of one brick touching a face of an¬ 
other brick as a link. The link structure was binary, with 
all links being represented in the adjacency matrix as hav¬ 
ing a value of 1, and all other elements of the adjacency 
matrix having a value of zero. This was not necessary for 
GRNEAT/NEAT, as both GRNs and neural networks are al¬ 
ready represented as networks, with links having non-binary 
weights. Since GRNEAT produces both a matrix of en¬ 
hancement weights and a matrix of inhibition weights, we 
combined them into a single weight matrix by subtracting 
the inhibition factors from the enhancement factors. 

Many artificial life and theoretical biology studies of 
modularity use the metric Q , defined by the approach of 
(Newman and Girvan, 2004). This approach determines Q 
by looking at the percentage of edges in the network that 
connect nodes in the same module, and substracts the ex¬ 
pected value for that percentage in a network with the same 
number of modules but random connections. The modules 
are defined by a previous part of the algorithm that splits the 
network into the modules that would maximize Q. Mathe¬ 
matically, the equation for Q in the Newman-Girvan algo- 


where represents edge weight between nodes i and j, 
m represents half the sum of the graph’s edge weights, 5 is 
a delta function, q and Cj are node communities, and ki and 
kj are the sums of the weights of all edges attached to node 
i and node j respectively. 

Then, for each node, the algorithm calculates the change 
in modularity, the equation for which depends on whether 
the version of the algorithm for directed or undirected graphs 
is being used, for moving that node into the module of each 
of its neighbors. Once this change is calculated for all mod¬ 
ules that the node is connected to, the node is moved into 
the module that would result in the greatest modularity in¬ 
crease (or left in place if no modularity increase is possible). 
If no increase is possible, the first level of Q is equal to the 
current modularity of the network. Subsequent, hierarchical 
levels of Q are calculated the same way in subequent phases 
of the algorithm, by using the modules from the previous 
level as nodes in a new network. For our study, we used An¬ 
toine Scherrer’s MATLAB implementation of the Louvain 
method (Scherrer, 2008). 

Because modularity can be positive or negative (where 
negative modularity means that there is less internal integra¬ 
tion among modules than one would expect to see in a ran¬ 
dom graph), we defined a level of modularity as occurring 
when the Louvain algorithm produces a positive modularity 
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Figure 4: The mean number of levels of hierarchical mod¬ 
ularity produced by a developmental network-evolving sys¬ 
tem, GRNEAT, was higher than that produced by the non- 
developmental one on which it is based, NEAT. The 95% 
CIs for GRNEAT and NEAT were 1.85-2.05 and 1.09-1.51. 
Number of trials N = 20, p < 0.0001 


value, with an allocation of nodes into modules that (if there 
are lower levels ) involves combining some or all modules 
from the next-lowest level. To determine whether the num¬ 
bers of hierarchical levels of Louvain modularity were equal 
in our sets of results, we used Welch’s t-test, a variant of the 
traditional Student’s t-test that is robust to non-normality in 
data and difference in variance between samples. We did not 
test for differences in the actual Q values of the lowest or 
other levels, as they were tangential to the question of hier¬ 
archy. In practice, Q values for specific levels were between 
0.12 and 0.52 for both GRNEAT and NEAT (with most be¬ 
ing between 0.2 and 0.4, indicating moderate amounts of 
single-level modularity), and between 0.3 and 0.72 for both 
GENRE and its direct encoding counterpart. 

Results and Discussion 

Our first comparisons were between a set of 20 trials of 
GRNEAT on the intertwined sprials problem and 20 trials 
of NEAT with recurrency allowed on the intertwined spri¬ 
als problem, with mutation probability = 0.75 and crossover 
probability = 0.25, across 250 generations. In Fig.4, we can 
see that the best solutions produced in the GRNEAT trials 
had a mean number of levels of modularity of 1.95, while 
those produced in the NEAT trials had a mean number of 
levels of modularity of 1.3, a full 33% lower. This differ¬ 
ence in the means was statisically significant (p < 0.0001). 
The emergence of multiple levels of modularity in one GRN 
is shown in Fig.5. 

We wanted to examine whether the increased levels of 
modularity seen in GRNEAT were something that was 
emerging rather than something hard-coded into all GRNs. 


Figure 5: a) A GRNEAT-produced GRN with the links light¬ 
ened for easier viewing. The nodes are colored according 
to the 8 first-level modules found by the first phase of the 
Louvain algorithm, which found that this level of modular¬ 
ity had Q = 0.3018. b) A single module of the GRN. c) The 
network of the next level of hierarchy, with each of its node 
representing, and color-coded as, a module from the previ¬ 
ous level, d) The same next-level network, with Q = 0.3266, 
with the nodes colored in five new colors according to the 
5 second-level modules found by the second phase of the 
Louvain algorithm. 


Because the average fitness of the NEAT neural networks 
after 250 generations was notably worse than that of the 
GRNEAT GRNs, and the networks notably smaller (see 
Fig.6), we also wanted to compare the NEAT networks to 
GRNs of more similar fitness and size. Accordingly, we 
compared the 20 250-generation GRNEAT trials to 20 10- 
generation trials (Fig.7, and, as the 10-generation GRNEAT 
GRNs were similar in size and fitness to the NEAT neural 
networks, we compared the 10-generation GRNEAT trials 
to the NEAT trials (Fig.8) 

We can see from these figures that the mean hierarchi¬ 
cal modularity of GRNEAT-produced GRNs (along with the 
size) has increased by nearly a third (a mean 1.5 levels of 
modularity vs 1.95 levels) between the 10th and 250th gen¬ 
erations. We can also see tentative evidence (with a p value 
that is low but not statistically significant) that GRNEAT 
GRNs already have greater hierarchical modularity after 10 
generations than NEAT recurrent neural networks have after 
250, despite being nearly the same size, which is sugges¬ 
tive that this increased hierarchy is not solely a function of 
network size. 

While the most obvious difference between GRNEAT 
and NEAT is the artificial development aspect, it is pos¬ 
sible that there is some other factor influencing the devel¬ 
opment of hierarchical modularity. Therefore, we looked 
at a different developmental system using a very different 
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Figure 6: 250 generations of GRNEAT produced top so¬ 
lutions with greater mean fitness than 250 generations of 
NEAT. 250 generations of NEAT still had greater fitness than 
10 generations of GRNEAT. 


mechanism of development, GENRE, and compared it to 
the direct encoding algorithm packaged with its implemen¬ 
tation on the GENRE homepage (Hornby, 2001), as dis¬ 
cussed in the Methods section. One useful aspect of look¬ 
ing at GENRE’S brick-table structures is that the default fit¬ 
ness function for these structures encourages minimizing the 
number of blocks while maximizing other structural crite¬ 
ria. While, because of the nature of the block structures, 
the GENRE network representations were much larger than 
the GRNEAT or NEAT representations, they were actually 
smaller than those of the direct encoding alternative (640 
blocks vs 768 blocks). Therefore, the possibility of network 
size being a major contributor to the different levels of hier¬ 
archy produced by a developmental vs a direct encoding is 
addressed. 

As can be seen in Fig.9, there is a statistically significant 
(p = 0.0246) difference between the levels of hierarchical 
modularity produced by GENRE as compared to its nonde- 
velopmental alternative, where the GENRE-produced struc¬ 
tures had an average of 4.7 levels, and the others had an 
average of 4.2. 

Notably, in both cases, the number of levels was far higher 
than for GRNEAT or NEAT, regardless of development, and 
the network sizes were much larger, which suggests that 
network size may play some role in the number of levels 
of hierarchical modularity. However, the fact that GENRE 
structures have more levels than do a nondevelopmental al¬ 
gorithm optimizing for the same fitness function, in the same 
number of generations, despite being 17% smaller, provides 
further evidence that the developmental encoding is doing 
some of the work in the emergence of this hierarchy. 


Figure 7: Networks produced by GRNEAT after 250 gen¬ 
erations had greater mean levels of hierarchical modularity 
than networks produced by GRNEAT after 10 generations. 
The 95% CIs for GRNEAT and 10-generation GRNEAT 
were 1.85-2.05 and 1.28-1.72. Number of trials N = 20, p 
= 0.0014 


Conclusion and Future Work 

This work opens up two different areas of study. One is the 
study of the effects of developmental encodings on the emer¬ 
gence of modularity. This is an underexplored area, with the 
potential to contribute to our understading of the emergence 
of modularity in general. To our knowledge, this is the first 
time that developmental encoding effects on the emergence 
of modularity have been quantified. Another is the study of 
hierarchical modularity. While modularity is an active area 
of research, as we outlined earlier in this paper, the hierar¬ 
chical aspect of modularity has been ignored in modularity- 
emergence studies despite the importance of hierarchy in bi¬ 
ological and other understandings of modularity. This paper 
takes a first step toward remedying this oversight. 

Our results suggest several more specific avenues for fu¬ 
ture work. It may be useful to compare other develop¬ 
mental systems to similar nondevelopmental ones, to see 
whether the same effect is observed. Even though GENRE 
and GRNEAT use very different mechanisms for encod¬ 
ing development, increasing the number of systems stud¬ 
ied may provide more evidence that the effect seen here is 
mechanism-independent. Another possibility would be to 
adjust different parameters within the developmental sys¬ 
tems, to see if other factors can be identified that promote the 
emergence of hierarchical modularity. Finally, it would be 
interesting to study the effects of development in biological 
systems, as was done in (Kashtan et al., 2007; Parter et al., 
2007) to determine the effect of varying goals on modularity. 
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Figure 8: The mean number of levels of hierarchical mod¬ 
ularity produced by only 10 generations of GRNEAT was 
higher than that produced by 250 generations of NEAT. The 
95% CIs for 10-generation GRNEAT and NEAT were 1.28- 
1.72 and 1.09-1.51. Number of trials N = 20, p = 0.2 
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Abstract 

Many organisms can regenerate their bodies, but it is cur¬ 
rently unclear how they accomplish this feat. In this paper, 
we introduce a cell-to-cell communication mechanism that 
allows a 3D arrangement of cells to discover its structure and 
maintain it in the light of random cell death, even at very 
high death rates. We report results from simulations of an 
agent-based model that demonstrate the effectiveness of the 
proposed approach for Planarian worm-like shapes, but the 
proposed model is general and applies to any shape. 

Introduction 

Biological organisms have the ability to regenerate them¬ 
selves (Birnbaum and Alvarado, 2008), i.e., they are able 
to detect and reproduce damaged cells that make up their 
morphological structure. In some cases, whole body parts 
(e.g., limbs, tail, etc.) can be regenerated and the question 
arises how this information is encoded and where it is stored 
(Friston et al., 2015; Pezzulo and Levin, 2015)? While cur¬ 
rent orthodoxy would still point to genetic encodings and 
thus morphological information being stored in and recov¬ 
ered from gene expressions, there is converging evidence 
that this might not be so, at least not in all cases (see the 
next section). Some of the evidence (reviewed in Lobo et al. 
(2014)) comes from studies where morphological changes 
performed on organisms were regenerated after they were 
lesioned (e.g., damage to deer antlers can result in ectopic 
growths at the same spot of injury and these growth persist 
through several subsequence shedding and regenerations of 
the deer’s antlers (Bubenik, 1990)). Since there were no op¬ 
portunities for genes to encode those initial morphological 
changes, the information must have been stored elsewhere. 
But if morphological information is not stored genetically, 
what other mechanisms could be accountable for represent¬ 
ing the morphological structure of an organism? 

In this paper, we propose a dynamic messaging mecha¬ 
nism that while not yet mapped on biological substrate can 
functionally explain how morphological information can be 
obtained, stored, and used to repair structural damages to 
organisms. Specifically, different from genetic encodings 


where information is local to each cell, statically encoded 
in the gene and thus retrievable only locally, the proposed 
mechanism is distributed, dynamic, and integrates informa¬ 
tion across cells. Hence, it is able to detect when cells are 
missing in a structure and start a regrowth process that gen¬ 
erates exactly and only the missing parts. We will demon¬ 
strate the operation of the mechanism using an agent-based 
model of cell-to-cell communication and prototypical 3D or- 
ganismal shape of a flatworm and show that for various rates 
of random cell destruction (e.g., due to radiation) the organ¬ 
ism is able to maintain its structure. In concluding, we dis¬ 
cuss next steps for further simulations and validations of the 
employed principle. 

Background and Previous Work 

One of the major questions facing biology and biomedicine 
is how groups of cells cooperate to build and maintain 
complex anatomical structures. In many animals, this oc¬ 
curs over long time-spans, counteracting aging, carcino¬ 
genic transformation, and tissue abrasion. An understand¬ 
ing of the information structures and algorithms that keep 
cells orchestrated towards maintaining a large-scale body- 
plan would be very important for regenerative medicine, 
aging research, and degenerative disease, as well as hav¬ 
ing basic implications for understanding pattern regulation 
in evolution (Ingber and Levin, 2007).regenerative biology 
is the Planarian flatworm - a complex bilaterian organism 
that regenerates amputated pieces, and continuously main¬ 
tains its bodyplan despite significant turnover and remod¬ 
eling (Oviedo et al., 2003). While progress is being made 
with models of gene regulation (Lobo and Levin, 2015), we 
still seek testable models of cellular communication that ex¬ 
plain pattern memory (Tosenberger et al., 2015). Deriving 
generative, fully-specified models of pattern regulation in 
this kind of model species is an essential goal for converting 
molecular-genetic insights into actionable strategies for ma¬ 
nipulating growth and form in regenerative medicine, birth 
defects, cancer, and synthetic bioengineering (Doursat et al., 
2013). 

The problem of structural maintenance has been ap- 
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proached by the artificial life community using genetic algo¬ 
rithms, agent-based models and cellular automata to model 
the behavior of how a single cell could multiply and gen¬ 
erate a whole tissue, and after some time, this tissue could 
maintain its shape against some external or internal pertur¬ 
bations. Andersen et al. (2009), for example, used a genetic 
algorithm to evolve a gene regulatory network which con¬ 
trols the behaviors of cells. The authors put specific shapes 
that they want to create in their fitness function, thus the GA 
could find a network (i.e., genotype), which, starting using 
a single cell, generates that specific shape (i.e., phenotype). 
They concluded that different networks can lead to the same 
phenotype, and more interestingly, the shape become capa¬ 
ble of healing wounds even though this process was not en¬ 
coded in the fitness function. 

In Gerlee et al. (2011), the authors use a genetic algorithm 
to evolve a 3-dimensional cellular automaton that creates 
and maintains a mono-layer tissue structure. First, the au¬ 
thors wanted to show that a cellular automaton could evolve 
from a network containing just one cell, to a 2-dimensional 
structure similar to how epithelial cells are organized in most 
of the organs of the body. Further, the authors put some ex¬ 
ternal and internal perturbations to verify if the model was 
capable of returning to its original structure. Similar to our 
proposed model, each cell is a discrete agent which interacts 
with its neighborhood, and in their paper, depending on its 
concentration of oxygen and a generic growth factor. How¬ 
ever, in that model the network will grow indefinitely until 
it reaches a pre-defined area. 

Basanta et al. (2008), also used a 3D cellular automaton 
to model the interaction among neighbors cells and used a 
genetic algorithm to find a good genotype to perform the 
procedure. In this work, a 3D shape is created based on 
the cellular automaton’s rule coded in the genes of all cells, 
and at some point, some genotypes achieve a state of home¬ 
ostasis. After that, lesions were performed on the shape, 
and some genotypes were able to regenerate their structures. 
The authors verified that the organisms which perform best 
in this “wound recovery” were the ones which had a specific 
direction by which the cells evolved in the tissue creation. 

Overall, past approaches (to the extent that we could find) 
used some kind of genetic encoding to define how interac¬ 
tions among cells should take place. Thus, cells behaviors 
depend on their neighborhood and are encoded in the geno¬ 
type. Our proposed model, on the other hand, does not rely 
on any genetic encoding, because the behavior of the cells 
depends on the messages they receive. The critical advan¬ 
tage is that our model does not have any local storage of 
shape data nor does it rely on it; rather, it can dynamically 
learn and maintain new morphologies using the same under¬ 
lying mechanism. 


The Communication Model 

We start by first presenting the idea of the proposed cell- 
to-cell communication mechanism, followed by the detailed 
agent-based model implementing it based on the now cus¬ 
tomary ODD (Overview, Design concepts, Details) protocol 
(Grimm et al., 2010). 

Discovery and Regeneration 

The purpose of the agent-based model is to investigate pos¬ 
sible structure monitoring and regeneration process for 3D 
cell structures, possibly resembling organismal bodies such 
as the Planarian flatworm. Specifically, we intend to pro¬ 
pose mechanisms for such 3D cell structures to dynamically 
discover their morphology and then maintain it indefinitely 
in the light of random damages happening to parts of it such 
the damages that occur as part of natural aging. The basic 
idea is that cells can send messages to other cells or forward 
messages they receive from other cells which contain infor¬ 
mation about the path they traveled. This information can 
then be checked as a packet travels through the body’s cells 
and if a cell along the way is missing, it must have been 
damaged and thus needs to be repaired. To illustrate how 
this works, consider the 2D arrangement of cells in Fig. 1. 

The packet originating at the bottom right cell during “dis¬ 
covery” (where packets are randomly generated and only 
those are kept whose paths actually reflect paths that can be 
taken) consists of three segments of variable length - (0,4), 
(1,2), and (0,2) - where the first number in each pair indi¬ 
cates a direction (with 0 being West, 1 being SW, 2 being 
SE, and so on) and the second indicates the distance mea¬ 
sured in cells (4 means four cells across). Thus, this packet 
structure can specify arbitrary paths with up to two direc¬ 
tional changes to cover cells in a 2D arrangement. 

If that packet were now to retrace its path back to its ori¬ 
gin in a lesioned structure and thus could not find the fourth 
cell in a row as predicted by its (0,4) segment, this detec¬ 
tion failure could be used by the cell where the packet got 
stuck to grow a new cell in the missing position (as the old 
cell residing there must have died). The regrowth now al¬ 
lows the packet to complete the first segment of its path and 
the same process of regeneration repeats itself for the second 
and third segments up until all missing cells along the path 
have been recovered. Note that not all missing cells were 
regenerated, only those discovered by the particular packet 
along its path. For the other missing cells to be regenerated, 
additional packets with paths going through them would be 
needed. 

The 3D Spatial Agent-Based Model 

The proposed ABM model has just one type of agent rep¬ 
resenting the cells of the organism. Each agent has some 
attributes that describe them at a given time. Each agent i is 
defined by an unique identity number and its location on 
the organism’s body <i x ^ y ^z >• 
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Figure 1: Example of cell discovery, damage detection, and 
repair. 

The specific shape of the evaluation organism, a Planarian 
worm, is a 3D structure called rhombic dodecahedral honey¬ 


comb. One can imagine each cell as an hexagon with three 
other hexagons stacked above it and three other hexagons 
behind. Therefore, each cell is a rhombic dodecahedron 
hence it has at most 12 neighbor cells which is stored in 

a list iNeighb or S’ 

Cells hold and send packets to its neighbors. A packet 
/3 contains a list of vectors /3y, of distance and direction 
that describe the path that a packet has traveled across the 
cell network. The packets are organized in temporal order, 
with the most recent vector at the top of the list. Thus, each 
cell contains one list containing the packets received from 
its neighbors during a cycle i Received Packets and a list of 
packets the cell is holding i H eidPackets • 

Each vector v has an integer ^Distance representing the 
number of cells the vector traveled through, ^Direction rep¬ 
resenting one of the twelve directions in which the vec¬ 
tor traveled, and VMode which stores whether the packet is 
charting its path and adding to v or backtracking and taking 
data from v. 

At each cycle, each cell generates PacketFreq packets 
and sends them to adjacent cells in random directions. When 
a cell receives a packet, it increments its top vector’s dis¬ 
tance. For each packet, received in a given cycle, the cell 
will either (1) send the packet along the same direction as its 
top vector’s direction, (2) send the packet in a new direction, 
or (3) hold the packet. In order for a cell to hold a packet, 
this packet must have at least MinVectorsToHold , and the 
top vector must have a distance of at least MinTopLen. If 
the packet is not held, there is a BendProb probability that 
the packet will be sent in a new direction. This new direc¬ 
tion should be different from the opposite direction of the 
top vector’s direction. 

When a packet reaches a cell, the cell verifies the num¬ 
ber of bends until that moment. If this value is equal to 
MinBends , then the packet will backtrack, regenerating 
dead cells during this process. 

The model runs as a discrete-time simulation for a defined 
number of cycles, RunCycles. And at each cycle, the cells 
perform only two processes: sensing and acting. In the first 
they receive packets from their neighbors and decide if the 
packets will be held or sent (see Algorithm 1). Moreover, if 
a packet is backtracking, and the neighbor is dead, then that 
neighbor is regenerated during sensing process. The acting 
process is just the cells sending packets to their neighbors 
(see Algorithm 2). 

In the proposed model, each cell creates packets to send 
to its neighbors, the only interaction between agents. This 
local interaction creates an emergent behavior of structure 
maintenance where cells along the travel path are restored. 
As long as some packets will eventually hit each dead cell, 
the system is guaranteed to keep the structure intact. 

Regarding the stochastic procedure, there are two cases 
where they occur. First when a cell needs to decide the di¬ 
rection of a packet (a new packet or a received packet that 


354 


Algorithm 1 Pseudo code of the sensing process performed 
by the cells. 

Sense(z) 

for all packet(3 G i. ReceivedPackets do 
top <— (d.TopVector 
if top.Mode == Charting then 
top.Distance <— top.Distance + 1 
if [3.Bends > MinVectorsToHold 

and top.Distance > MinTopLen and 

isAlive(i.Neighbors[top.Direction}) then 
i.HeldPackets.add{(3) 
else 

if random () < BendProb then 

p. addVec(getNewDirection(top. Direction)) 

else 

(3.addV ec(t op. Direction) 

end if 

i.Sending Packet s.add(p) 

end if 
else 

top.Distance <— top.Distance — 1 
if top.Distance < 0 then 
i.ReceivedPackets .pop{) 

end if 

if top 7 ^ nil then 

if \isAlive(i. Neighbor s[reverse{top. Direction)]) 

then 

regenerateCell(i , reverse(top. Direction)) 

end if 

i.S ending Packet s.add(p) 

end if 
end if 
end for 

if i.HeldPackages.size{) > MinBends then 
for all packet/3 G i.HeldPackets do 
fd.Mode G- Backtracking 
i.Sending Packet s.add{f3) 

end for 
end if 


Algorithm 2 Pseudo code of the acting process performed 
by the cells. 

Act(i) 

for all packet j3 G i.S ending Packets do 
top <— p.TopVector 
if top.Mode == Backtracking then 

top.Direction <— reverse(top.Direction) 

end if 

if isAlive(i.Neighbors[top.Direction]) then 
sendPacket{i , top. Direction, packet) 

end if 
end for 

i.Sending Packets, clear () 


needs to change direction). The second stochastic procedure 
is the random death of cells which will be explained in the 
next section. 

Simulation Experiments 

The goal of the experimental evaluation was to see whether 
the proposed cell-to-cell communication mechanism would 
be sufficient to maintain the structure of an organism over 
time in light of random cell death. The model was imple¬ 
mented in our Java-based agent-based SimWorld simulation 
environment (Scheutz and Harris, 2011 ). 1 For all simu¬ 
lations runs, we consider a prototypical 3D Planarian-like 
structure with a fixed shape of 8 layers containing 339 cells 
each, resulting in 2712 cells total (the top-most layer of cells 
of the employed shape is depicted Figure 2). 



Figure 2: Shape of the topmost layer of the worm containing 
339 cells. 

To simulate the process of structural deterioration (e.g., 
due to a toxic or radioactive environment, or the natural ag¬ 
ing and death of cells), we fixed a particular cycle in the 
simulation when this process would start to occur ( Death- 
Time=80). At the moment that a cell dies, all held packets 
are lost and consequently it cannot transmit other packets 
that reach it later. To verify whether enough of the struc¬ 
ture of the organism’s body was still intact, we fixed the 
Threshold as 90% of alive cells for the entire simulation, 
i.e., for the organism to be considered “intact” at least 90% 
of its cells must be alive at any given cycle. 

The function is Alive verifies if a specific cell is alive and 
can transmit packages. If a cell tries to send a packet which 
has its top vector in the Backtracking mode, and the neigh¬ 
bor cell supposed to receive this packet is not alive, then 
the alive cell calls the function regenerateCell which “re¬ 
vives” the dead neighbor. The function getNewDirection 
randomly chooses a new direction distinct from the direction 
passed as a parameter and also distinct from the reverse of 
this direction, to assure that the packet would not return from 
the cell it comes. Thus, reverse is a function that given the 
direction to one side of the dodecahedron, returns the direc¬ 
tion to the reverse side of this polyhedron. Finally, the func- 

1 SimWorld is a versatile environment with support for graph¬ 
ical and batch runs of models. It is easy to program and easy to 
extend, and it provides an interactive graphical interface user in¬ 
terface for inspecting agent behavior (and novel mechanisms for 
playing a simulation forward and backward, which supports the 
modeller in detecting interesting emergent behaviors). SimWorld 
has been under development in our lab for over a decade. 


355 

































tion sendPackets adds the packet to the ReceivedPackets 
list of the cell that exists on the direction of the top vector. 

Simulation runs can terminate in two different cases: for 
every cycle during the simulation, the organism must have at 
least Threshold percent alive cells otherwise the simulation 
stops. The second condition is when the simulation reaches 
the pre-determined limit of 500 cycles (if the organism can 
maintain its structure through 420 cycles, then we assume it 
can do so indefinitely, at least in approximation). 

To explore the parameter space of the model, we first 
varied the probability of a cell dying on a given cy¬ 
cle ( DeathProb ) in order to simulate the death of cells 
as time passes. For our experiments, DeathProb G 
{0.0,0.01,0.02,0.03,0.04}. For example, with a 2% death 
rate per cell per cycle, every cell will on average die every 50 
cycles or 10 times in the course of the 500 cycle simulation. 
Since there are 2712 cells in the body, over 54 cells will die 
on average at any given cycle which is significant structural 
damage that accrues over time if not repaired quickly. 

We also varied the number of new “packets” a cell pro¬ 
duces on each cycle ( PacketFreq ). In our experiments, we 
varied PacketFreq G {1,4, 7,10,13,16,19, 22, 25, 28, 31}. 

In order to control the variety of navigation patterns of 
packets, we varied the minimum size of the vector of bends 
before a packet can backtrack ( MinBends)\ the minimum 
length the top vector of a packet should be to be able to bend, 
by adding a vector in new direction ( MinTopLen)', the prob¬ 
ability that a packet will bend, given the top vector length is 
at least MinTopLen ( BendProb ). For our experiments, Min- 
Bends G {1,3, 5, 7}, MinTopLen G {1,3, 5, 7} and Bend- 
Prob G {0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0}. 

As our model has stochastic processes, we need to explore 
this parameter space using different random number gener¬ 
ator seeds. Thus, for each point in the parameter space, we 
ran 8 different simulations resulting in a total of 47520 sim¬ 
ulations. The dependent variable was the number of cycles 
the simulation ran with more than Threshold alive cells. 

Results and Analyses 

From 50688 different data points that have DeathProb > 
0.0, 28961 points maintained a rate of 90% of cells alive dur¬ 
ing the whole simulation, i.e., 500 cycles. More specifically, 
11801 points were with rd = 0.01, 9685 with rd = 0.02, 
5802 with rd = 0.03 and 1673 with rd = 0.04 as shown 
in Figure 3. These results show that there exists a parameter 
space in which our model can repair death cells and maintain 
the individual’s structure indefinitely. The mean number of 
cycles with number of alive cells above 90% for all simula¬ 
tions was 388.521. 

In order to compare the main effects of each indepen¬ 
dent variable on the cycles above threshold, we performed 
an ANOVA with PacketFreq , MinBends , MinTopLen , Bend¬ 
Prob and DeathProb as independent variables and the Cy¬ 
clesAboveMin as dependent variable. The ANOVA shows 
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Figure 3: Histogram of points which maintain the structure 
of the worm after 500 cycles. 
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Figure 4: Interaction between RandomDeath and Cy- 
clesAboveMin for each number of produced packets per cy¬ 
cle. 


significant main effects for all independent variable other 
than BendProb. Significant two-way, three-way and four¬ 
way interactions among variables other than BendProb 
were also found. These results confirm our hypothesis that 
variance in the packet vector is not relevant for the process 
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of structure maintenance. 

As expected, there was a significant negative effect be¬ 
tween DeathProb and CyclesAboveMin as shown in Fig¬ 
ure 4. We also found a positive correlation with PacketFreq 
and Cycle sAboveMin as shown in Figure 5. Increasing the 
value of PacketFreq means more variations of possible pack¬ 
ets are explored, and from a certain point on increasing this 
value will generate more redundant packets than novel ones, 
maintaining an asymptote. The point by which the perfor¬ 
mance does not change depends on the probability of cell 
death, because as we increase the probability of cells deteri¬ 
orating, more packets are not redundant; more specifically, 
more packets are necessary to maintain the structure of the 
organism. 



3 5 7 
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Figure 5: Interaction between PacketFreq and Cy- 
clesAboveMin for each probability of a random cell death. 

Regarding the MinBends (see Figure 6), a moderate level 
(. MinBends= 3) of minimum bends (before a packet can 
backtrack performed best. This value shows the optimiza¬ 
tion between the tradeoff of a longer packet covering a large 
area of the individual but also being more at risk of losses 
happening due to random cell deaths. 

Figure 7 shows the interaction between MinTopLen and 
CyclesAboveMin. It is important to note that as MinTopLen 
increases, the length of the packet increases, therefore the 
packet must spend more time traversing before backtrack¬ 
ing. Consequently, it increases the chance of a packet be¬ 
ing lost to random death before it can repair another dead 
cell. Thus, for lower values of RandomDeath , it is good to 
have high values of MinTopLen. However, increasing Ran¬ 
domDeath confirms the tradeoff between better coverage 
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Figure 6: Interaction between MinBends and Cy¬ 
clesAboveMin for each probability of a random cell death. 

and the risk of longer packets shifts, and for RandomDeath 
= 0.04, MinTopLen = 1 performs best. 

The interaction between MinTopLen and MinBends shows 
an optimal combination with MinTopLen = 1 and MinBends 
= 3 (see Figure 8). Inverting the values of the two variables 
reduces the performance, even though these two combina¬ 
tions yield the same total packet length. The explanation for 
this is that the same length of packet can cover a wider space 
if it has more bends. This tradeoff is most pronounced when 
changing from a single bend to two bends. 

Discussion 

Our results show that organisms were able to maintain 
their structure using the proposed cell-to-cell communica¬ 
tion mechanism for the right set of parameters: a high Pack- 
etFrequency > 22, a moderate MinBends = 3, a low MinTo¬ 
pLen = 1, and the value of BendProb not being relevant. 
Without modifying the algorithm, we hypothesize that it is 
possible to regenerate the worm from various more system¬ 
atic cuts as well where a large part of the body is removed. 
For such a lesion to be healed, the packets residing in alive 
cells in the remaining body would have to be such that their 
collective paths would cover all excised cells which would 
then be regenerated during backtracking. 

Space limitations allowed us to only discuss one particular 
structure but the proposed mechanisms are general enough 
to work for a very large set of structures. Whether a struc¬ 
ture will be maintainable will effectively depend on both 
how cells die (e.g., randomly or because of lesions cutting 
of whole segments of the body) and how many bends pack¬ 
ets can have which they will need to recover complex struc- 
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Figure 7: Interaction between MinTopLen and Cy- 

clesAboveMin for each probability of a random cell death. 


tures that require many path segments to hit all component 
cells (e.g., to regenerate a cut-off arm packets need to travel 
through the upper arm, the lower arm, the wrist, the palm, 
and the various finger segments, thus requiring a larger num¬ 
ber of segments in the packet). 

Bi-directional cell communication in vivo takes place via 
several kinds of physical media (Edelstein et al., 2016), 
including chemical signals (diffusible molecules), physical 
forces (pressures and tensions), and bioelectric signaling 
(voltage gradients) (Levin, 2012). The latter is especially in¬ 
teresting because it enables many of the functions described 
in our model (Funk, 2013). Indeed, brains evolved by spe¬ 
cializing such communication functions that were present 
from the dawn of multicellularity, and optimizing it for 
communication and message-passing functions in the cen¬ 
tral nervous system (Keijzer et al., 2013). The more an¬ 
cient form, developmental bioelectricity (Bates, 2015)m is 
a modality by which collections of cells communicate, store 
memory, and make group decisions about growth and form 
during embryogenesis and regeneration (Pezzulo and Levin, 
2015). Using proteins such as ion channels and pumps, 
cells regulate their bioelectric dynamics (Levin, 2014; Mus¬ 
tard and Levin, 2014). However, using electrical synapses 
(gap junctions), cells can detect the presence and physio¬ 
logical state of neighbors (Palacios-Prado and Bukauskas, 
2009). Communication via gap junctions has recently been 
shown to exert significant instructive control over growth 
and form during regeneration in planaria and other model 
systems (Emmons-Bell et al., 2015). 



t 3 5 
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Figure 8: Interaction between MinTopLen and Cy- 
clesAboveMin for each minimum number of bends before 
backtracking. 


Conclusion 


In this paper we introduced the first agent-based model of 
structure discovery and repair which allows 3D cell structure 
to discover their organization and repair it from damages 
occurring due to cell death. We demonstrated the efficacy 
of the mechanism in large set of simulations of random cell 
death occurring at different rates in simulated body shaped 
as a Planarian. For even high cell death rates, we found pa¬ 
rameters for the proposed cell-to-cell communication mech¬ 
anisms that could maintain the structure indefinitely. 

As a next step, we would like to verify how the model 
behaves with non-equally distributed cell death, i.e., where 
a cluster of adjacent cells dies at the same time due to, for 
example, the action of some toxin or an impact on a specific 
area of the organism. If for all dead cells there is a remaining 
packet held by an alive cell, then all cells can be regenerated. 

In addition, we intend to investigate the regeneration from 
cuts that in vivo worms present. It is well known that the 
Planaria is capable of regeneration from cuts 27 ^ th of the 
intact animal volume (Morgan, 1898). Our hypothesis is that 
there exists a parameter assignment by which our model is 
capable of regenerating structure from simultaneous death 
of a large area of cells and also from any number of cuts for 
the right set of communication parameters. 
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Abstract 

We present the use of a new computationaly efficient 3D 
physics model for the simulation of cells in a virtual aquatic 
world. In this model, cells can freely assemble and discon¬ 
nect along the simulation without any separation between the 
development and evaluation stages, as is the case in most evo- 
devo models which only consider one cell cluster. While 
allowing for the discovery of interesting behaviors through 
the addition of new degrees of freedom, this 3D center-based 
physics engine and its associated virtual world also come with 
their drawbacks when applied to evolutionnary experiments: 
larger search space and numerous local optima. In this paper, 
we have designed an experiment in which cells must learn 
to survive by keeping their genome alive as long as possible 
in a demanding world. No morphology or strategy is explic¬ 
itly enforced; the only objective the cells have to optimize is 
the survival time of the organism they build. We show that a 
novelty metric, adapted to our evo-devo matter, dramatically 
improves the outcome of the evolutionary runs. This paper 
also details some of the developmental strategies the evolved 
multicellular organisms have found in order to survive. 

Introduction 

Over the past two decades, the artificial life community has 
seen the development of several models for the simulation 
of environments in which cells can freely evolve. Many 2- 
dimensional models have been used, mainly for their sim¬ 
plicity and their computational efficiency, (Doursat, 2009; 
Joachimczak et al., 2013), but also because they are often 
sufficient to let interesting cellular behaviors emerge. With 
the addition of the third dimension come both large pos¬ 
sibilities in the exploration of artificial life and the excit¬ 
ing opportunity to more precisely compare and understand 
real world observations. While there are several 3D physics 
engines and simulators developed specifically for artificial 
life (Joachimczak and Wrobel, 2011; Fontana and Wrobel, 
2013; Doursat and Sanchez, 2014; Cheney et al., 2014), 
combining low scale features of cells with efficient simula¬ 
tion at the scale of a whole organism can prove challenging. 
It requires either ignoring interesting aspects of cells such 
as their polarisation system, complex adhesive properties or 
variable stiffnesses, or abandoning computational efficiency. 


Of course, many models of cellular simulations are not 
directly linked to the aLife community (although some have 
been used for artificial life experiments) and are more tightly 
related to bio-simulation, strongly focusing on the realism 
of the simulations they produce. Over the years, many mod¬ 
els have been developed using various approaches, among 
which 2D lattice based cellular automata (Ouchi et al., 
2003), various off-lattice 3D center-based models and even 
precise hybrid multi-scale systems which combine cell-level 
deformations as well as tissue-scale constraints (Lowengrub 
et al., 2009), to cite just a few. In the context of artificial life, 
and specifically when growing multicellular artificial organ¬ 
isms, the complexity of the simulated world directly im¬ 
pacts the developmental strategies and possible morpholo¬ 
gies of the creatures. As this can make for some behaviors 
and strategies that are more desirable and might also help 
in the understanding of real-life behaviors by bringing more 
realism, it also comes with at least two obvious trade-offs. 
First, adding realism and complexity to the artificial world 
will often increase the required computational power, which 
is a resource of prime importance when using genetic algo¬ 
rithms that require the simulation of thousands of instances 
of these worlds. Secondly, and still in the context of artificial 
evolution, adding complexity to the world can dramatically 
broaden the search space, requiring even more simulations 
for evolutionary algorithms to come up with a convincing or¬ 
ganism, and further complexifying the fitness landscape. It 
can thus be argued that the simulation of cells for the growth 
of artificial multicellular organisms is, while sharing obvi¬ 
ous common roots, a different problem than the simulation 
of real world cells. In this context, while we take our inspira¬ 
tion from biology when designing a cell simulation engine, it 
is of prime importance to keep these trade-offs in mind and 
to try and see where the truly desirable features lie, those 
from which an evolved multicellular organisms might bene¬ 
fit, and those that can be simplified. 

In this work, we propose to set up artificial life experi¬ 
ments in a 3-dimensional world using a fast cellular physics 
engine tailored to artificial life, MecaCell, that offers dy¬ 
namic cell-cell interactions such as collision, adhesion and 
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volume conservation approximation while keeping the com¬ 
putational cost in reasonable limits. We have designed an 
experiment in which the virtual multicellular organism will 
have to face many local optima created by both the added 
degrees of freedom and the rules of the world in which it 
evolves. We show how novelty search with a morphology 
metrics can, when used in conjunction with a fitness func¬ 
tion, help overcome many of these local optima. The ex¬ 
periment we present in this paper challenges one cell to pre¬ 
serve its genetic material in a sea-like environment as long 
as possible. In order to do so, the cell (which can choose to 
eventually become an organism after division) will have to 
face harsh conditions where energy is a difficult resource to 
harvest. Organisms, or rather same-DNA cell colonies, will 
thus have to balance their in-water morphology to collect 
light energy while maintaining solid roots in the ground in 
order to collect a second essential type of energy. While di¬ 
vision of labor might play a determining role in the survival 
of the colony (harvesting nutrients and light, sharing energy, 
maintaining the structure of the organism), the rules of the 
simulated world should make for the appearance of different 
viable strategies. In the lineage of our previous work (Disset 
et al., 2014), and to reduce the clues provided by a heav¬ 
ily engineered fitness function as much as possible, the cell 
controllers, based on gene regulation, are only evolved for 
survival (duration of the simulation). In addition, we study 
the impact of a novelty search criterion. 

Simulated world 


Cell and volume conservation In MecaCell, each cell is 
an agent represented by a center, a membrane and an orien¬ 
tation. A cell can freely evolve in a 3D continuous environ¬ 
ment, where it will collide and adhere with other cells. Here 
we consider cells to be spherical objects filled with a mostly 
incompressible fluid and wrapped in an elastic membrane. 
Every cell has a rest radius R r and a dynamic radius Rd . 
The dynamic radius was introduced to enable an approxi¬ 
mation of volume conservation: at each time step t 9 if a cell 
is cut (overlapping either another cell or a 3D object), we re¬ 
compute both its membrane surface area A t and its current 
volume Vt. The net difference in volume (relatively to its 
rest value V r ) is then translated into a pressure stress p t : 

I x (Vt - V r ) 

Pt = - At - 

where I is the compressibility coefficient of the cell. Cell 
pressure acts as a force governing s growth. When pres¬ 
sure increases under stress, the cell will compensate by ex¬ 
panding its radius in order to recover its original volume. 
This variation naturally implies a modification of its current 
membrane surface area A t , which will also act as a shrinking 
force on the dynamic radius. The cell membrane is thus, in a 
computationally efficient manner, brought into equilibrium 
between volume conservation and surface area conservation, 
using the following explicit integration scheme: 

R dt = R dt —% + At 2 x(AV-AA-^xC) 

a t 


This section presents the different aspects of the simulated 
world we propose to investigate 1 . The main goal is to try 
various characteristics of the physics engine and to explore 
ways to mitigate the adverse effect of added degrees of free¬ 
dom (comparatively to a 2D simulator or a 3D cell simu¬ 
lator which doesn’t account for precise dynamic adhesions, 
for example). We want our virtual organisms to be able to 
evolve efficient and varied solutions to the problem of sur¬ 
vival in a constrained environment. 

Cell physics - MecaCell 


where AV is the volume variation V t — V r , A A is the surface 
area variation A t — A r and C is a damping coefficient. 

Collisions In this model, collisions are easily handled by 
detecting two overlapping cells and by computing the nor¬ 
mal and the area of the resulting contact plane. Each cell 
will then push on the other perpendicularly to this plane and 
according to their internal pressure (resulting from their de¬ 
formation). The intensity of the force applied between a cell 
C a of internal pressure p a and a cell C& (with internal pres¬ 
sure pt) through a contact plane of area A c is given by: 


MecaCell 2 aims to be an artificial life friendly and generic 
platform for the 3D simulation of cells. Its goal is to provide 
a continuous physics environment that is computationally 
efficient and versatile enough to tackle various ALife exper¬ 
iments and configurations (with exotic or simplified physics 
rules, for example). 


\\F\\=A c x(P a + P b ) 

with Pi = | 


0, if Pi <0 
Pi , otherwise 


A tunable damping term C co i is also added. 


1 All the source code as well as images and videos are available 
at https ://github.com/j disset/seacells 

2 MecaCell is written in C++ and available (under LGPL li¬ 
cense) at https://github.com/jdisset/MecaCell. It includes a custom 
OpenGL display engine with a plugin system for the extensibility 
of its interface. 


Adhesions When wanting to simulate artificial multicellu¬ 
lar organism in a 3D environment, the capability to maintain 
oriented connections is of prime importance. In MecaCell, 
cell-cell adhesions use the same kind of contact planes than 
for the collisions. A cell can choose its adhesive properties 
distribution across its membrane through the definition of an 
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adhesion function f ac ih which associates an adhesive recep¬ 
tor density d a dh to a unit vector expressed in the local coor¬ 
dinate system of the cell (and represents the adhesive poten¬ 
tial at a given membrane location). We simulate an adhesion 
between two cells by the creation of a dynamic mass-spring- 
damper system of length 0, attached to the centers of the 
contact surfaces on both cells membranes. This spring acts 
on both membranes but all of the generated forces and mo¬ 
mentum is applied at the respective cells centers. When the 
two adhesive cells get closer from each other, the centers of 
the adhesion planes are updated, as well as all the mechan¬ 
ical properties of the adhesion mass-spring-damper system. 
The stiffness K and damping coefficient C are proportional 
to the contact plane surface area as well as the average recep¬ 
tor density on said surface (and to the intrinsic characteris¬ 
tics of these receptors, which can be different for every cells, 
or favor certain cell-cell affinities between cellular types). 
When two adhesive cells are pulled (or rotated) apart, the 
adhesive dynamic mass-spring-damper system can elongate 
up to a certain length defined by the maximum length reach¬ 
able by an adhesion receptor. Thus, if the cells are pulled 
apart too strongly (relatively to the strength of their connec¬ 
tion), they can actually come out of contact again. Similarly, 
if they experience a torque of too much intensity or a shear 
stress above a certain threshold, they will be able to slide on 
each other’s membrane (the centers of their adhesion plane 
will have moved too far apart due to rotation). 

Environment - Ground and sea 

In addition to the cellular physics model presented previ¬ 
ously, the world of this particular experiment is divided in 
two parts: the ground and the sea. 

Ground The ground is a dense medium in which cells can¬ 
not easily move. In order to achieve this effect, we used a 
special integrator which does not take into account any iner¬ 
tia term, using only the force exerted on each cell to compute 
its next position. This ground acts as a solid when the forces 
exerted by the cells are below a certain threshold, only al¬ 
lowing cells to move if they push hard enough. This is, al¬ 
though in a simplified manner, a depiction of the mechanical 
characteristics of dense mud. 

The ground contains nutrients, which are not available in 
the water. They are present in the mud at various depth, in 
small areas and finite amounts. At the beginning of the sim¬ 
ulation, we initialize N = 200 nutrients sources. For a given 
nutrient source i placed at a random position (x^ yi,Zi) in 
the mud, the initial amount of nutrient n* is given by: 

= Qn X (1 H - Cn X \l)i\ 71 ) 

where Q n is a constant and P n and C n are two parameters 
that determine how the amount of nutrient varies for each 
nutrient source according to its depth yi. This is meant to 


mimic how the nutrient distribution can be different accord¬ 
ing to the type of soil. It also allows for the tuning of some 
aspects of the fitness landscape: with C n < 0, the selec¬ 
tive pressure would force the cells to expand laterally while 
a positive value of C n should favor a vertical growth to find 
more reliable sources of nutrients. In this particular experi¬ 
ment, we use Q n = 0.03, P n = 1.5 and C n = 0.035. These 
values have been chosen empirically in order to create an en¬ 
vironment in which organisms can easily survive for a short 
amount of time but must develop complex strategies to sur¬ 
vive longer. 

Sea The second layer of the world is placed on top of the 
ground. We call it water, because its mechanical character¬ 
istics, namely density and viscosity, are supposed to mimic 
those of a still body of water. Here, a classic semi-implicit 
Euler integration scheme is used to update the cell positions 
and orientations. For computational efficiency purposes, no 
flows are simulated in this water. However, the cells are all 
slightly buoyant which means that they need to keep adhe¬ 
sions to cells that are still inside the mud in order to avoid 
being taken away. 

Light is abundantly available in the water but stopped by 
the ground. It only comes in straight rays, perpendicular to 
the ground, and if one light ray shines upon a given cell, it 
won’t be able to reach any other cell below that first one. 
In other words, cells block light and their shadows prevent 
other cells to be lit. We implemented this feature using a 
classical depth-buffer and depth-culling algorithm. 

Cells 

Cell life cycle In order to survive in this world, a cell has to 
fulfill one requirement: all its energy levels must stay above 
zero. In this particular experiment, a cell needs to handle 
two forms of energy: light and nutrient. At the initialisation 
stage, we place one unique “seed” cell in the mud, just be¬ 
low the water (precisely one cell diameter deep). When the 
simulation starts, the seed cell has maximum levels of light 
and nutrient, mimicking the seed endosperm (which provide 
the initial energy to the seed). At each time step, every cell 
consumes a fixed amount of light and nutrient energy. When 
any of the two levels of energy reach 0, the cell dies. 

We implemented a simplified cell cycle in which every 
cell can choose between three actions: growth, quiescence, 
apoptosis. This lifecycle is controlled by an aGRN that will 
be detailed at the end of this section. When in quiescent 
mode, the cell consumes normal amount of nutrients and 
light. When choosing apoptosis, the cell will disappear and 
all the nutrients and light it contained will be lost. When a 
cell enters its growth phase, it will grow (while consuming 
20% more energy) until its volume has doubled; at which 
point division will happen along a particular axis, deter¬ 
mined by the cell’s aGRN. When division occurs, the mother 
cell is replaced by two identical daughter cells whose energy 
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squared law. Thus, for any receiver positioned at P r , the per¬ 
ceived intensity I m of a morphogen m emitted by N sources 
placed at positions P s i with intensity E m i is given by: 


levels are exactly half those of the mother cell at the time 
of division. Only one variable, the age of the cell, differs 
between the two daughters cells: one is kept, the other is 
restarted at zero. This variable is incremented at each time 
step and is an input to the cells’ aGRN. 

Energy Nutrients and light are not available at the same 
place, which means the cells of our organism need to be able 
to absorb nutrients and light and share that energy with each 
other. More generally, a cell with large quantities of energy 
should be able to transfer part of it to any cell in need. In this 
experiment, we approximate this process through a passive 
diffusion based on Darcy’s law, which describes the flow 
of an incompressible fluid throughout a porous isotropous 
medium in the laminar case (which is arguably the case here 
given the low Reynolds numbers involved). The energy (nu¬ 
trient or light) flow F n between two connected cells a and b 
is thus described by the following equation: 

—k x A x A p 

Fn= Vl 

where A p is the energy’s pressure drop (here approximated 
by the difference in levels — n a or Z& — l a where n x and 
l x are respectively the nutrient and the light level of cell x) 
between cell b and cell a. This flow is also determined by 
the intrinsic permeability of the medium k , the viscosity of 
the nutritive fluid /r as well as the connection area A and the 
distance L between the two cells centers. The value of this 
flow is computed at each time step for each active connec¬ 
tion (i.e. real adhesions) between two cells using an explicit 
integration scheme. Using the free surface area of a cell’s 
membrane, we also use this diffusion system to simulate the 
absorption of both light and nutrients from the environment. 
Any lit cell will perceive a light intensity proportional to its 
elevation (above the ground) until a certain altitude where 
this intensity is capped to one. Inside the ground and from 
any cell positioned at P c , the available nutrients concentra¬ 
tion A s coming from a nutrient source s at position P s , with 
current absolute content in nutrient C t , initial diffusion ra¬ 
dius of R t o and an initial content of C t o is given by 

A s = C t x( 1 - (| P s - P c \/R t0 * (Ct/Cto)) * C t /C t0 ) 

Morphogens Bio-inspired communication through the 
diffusion of molecules in the environment has successfully 
been used in numerous artificial life experiment and has 
proven to be an efficient way to enable information trans¬ 
mission between agents. While some authors use detailed 
and realistic diffusion of signalling molecules, here we use a 
simple instantaneous diffusion system. Every cell can emit 
one or several of N m morphogens through the mi output 
protein concentration of its aGRN, and can sense the con¬ 
centration of each morphogens through its q input proteins. 
The perceived intensity of a morphogen follows an inverse 



where A m is the attenuation coefficient of morphogen m. 
For each cell, we compute the gradient of a morphogen m 
as the averaged variation of its intensity along the x, y, and 
z axis, from one extremity of the cell to the other. 


Cell adhesion In the early stages of this experiment, ev¬ 
ery cell would automatically establish a strong connection 
with every other cell upon contact. This led to the invari¬ 
able collapsing of the morphology diversity, especially in 
the water part of the world, where inertia is not negligible. 
Indeed, as cells divide, they experience various forces that 
propagate along the entirety of the organism. As a result, 
opposite ends of an organism often come in contact, bounc¬ 
ing against each other; but the automatic creation of a strong 
connection would prevent cells to go back apart and will 
eventually make for the construction of an unordered blob of 
connected cells. In various multicellular artificial life mod¬ 
els, this problem is avoided because the actual simulation 
stage, in which the organism is evaluated, is separated from 
the development phase, where the cells are positioned and 
linked without perturbations. While this simplifies things 
and allows for the creation of complex morphologies with¬ 
out the risk of discovering a spherical amalgamation of cells 
at the end of the evaluation, it also means that we lose some 
of the properties of real world organisms which can be of 
prime interest, especially for this experiment which aims to 
get closer to real world organism development: mainly self¬ 
repair and real time morphology adaptation to a changing 
environment. To tackle this problem, we once again take in¬ 
spiration from biology by introducing a mechanism which 
lets the cell decide if it wants to create new connections or 
only keep the ones already existing and bounce off of a po¬ 
tential companion. This capacity, named “solidify”, is man¬ 
aged by the cells’ gene regulatory network. In MecaCell, the 
normal algorithm for adhesion creations between two cells is 
to “ask” them what are their reciprocal affinities at each time 
step. In order to let the cells decide when they are open to 
new adhesions, we add an “active connections” list to each 
cell that keeps track of all their “real” adhesions. At each 
time step, and for every cell, we compare this active con¬ 
nections list with a candidate list of cells that are currently 
colliding. A new bond is then created only if both candi¬ 
dates decide not to solidify. In combination with the other 
proteins provided as inputs to the aGRN (such as the cell age 
t and its mechanical pressure p), this, in theory, allows for 
the emergence of complex adhesions strategies. 
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Cell controller - aGRN Within our multicellular organ¬ 
ism, each cell has its own gene regulatory network that con¬ 
trols the cell lifecycle. Even though the aGRNs are physi¬ 
cally different in the cells of the same organism, as in na¬ 
ture, they share the same genetic code and thus, the same 
topology. When a cell division occurs, an exact clone of the 
mother cell’s aGRN is copied into the daughter cell. In this 
work, the gene regulatory network used to control the cells is 
inspired by Banzhaf’s model. This model has been designed 
for computational efficiency and is not meant to simulate a 
real biological gene regulatory network in all its complexity. 

This model is composed of a set of abstract proteins. A 
protein a is composed of three tags: (1) the protein tag id a 
that identifies the protein, (2) the enhancer tag enh a that de¬ 
fines the enhancing matching factor between two proteins, 
and (3) the inhibitor tag inh a that defines the inhibiting 
matching factor between two proteins. These tags are coded 
with an integer in [0,p] where the upper bound p can be 
tuned to control the precision of the network. In addition 
to these tags, a protein is also defined by its concentration 
that will vary over time with particular dynamics described 
later. A protein can be of three different types: input, a pro¬ 
tein whose concentration is provided by the environment, 
which regulates other proteins but is not regulated; output, a 
protein with a concentration used as output of the network, 
which is regulated but does not regulate other proteins; and 
regulatory, an internal protein that regulates and is regulated 
by others proteins. 

With this structure, the dynamics of the aGRN are com¬ 
puted by using the protein tags. They determine the produc¬ 
tivity rate of pairwise interaction between two proteins. For 
this, the affinity of a protein a for another protein b is given 
by the enhancing factor u+ b (resp. the inhibiting factor u~ b ) 
calculated with the euclidean distance between protein b id 
tag and protein a enhancer (resp. inhibitor) tag. The proteins 
are then compared pairwise according to their enhancing and 
inhibiting factors. For a protein a, the total enhancement g a 
and inhibition h a are given by the sum of the exponential 
influences between the proteins. Two parameter (3 and S are 
used to control the dynamics of the system: (3 affects the 
importance of the matching factors and 5 is used to modify 
the production level of the proteins in the differential equa¬ 
tion. In summary, the lower both values are, the smoother 
the regulation is; the higher the values are, the more sud¬ 
den the regulation is. The concentrations are updated with 
a simple differential equation taking into account the newly 
produced proteins and the destroyed one. More details on 
the aGRN dynamics can be found in (Cussat-Blanc et al., 
2015). 

Table 1 describes the configuration of our aGRN input and 
output proteins when applied to this artificial embryogene- 
sis experiment. A few clarifications on the role of some of 
these inputs and outputs is necessary. First, the sensed nutri¬ 
ents (c n ) input represents the actual concentration in nutri- 


Name 

Type 

Description or use 

c i,Vie [0,2] 

input 

concentration of morphogen i 

Cn 

input 

sensed nutrients 

n 

input 

current nutrients level. 

Cl 

input 

sensed light intensity 

i 

input 

current light level. 

t 

input 

age of the cell 

P 

input 

mechanical pressure 

°i,\/ie[ 0,2] 

output 

morphogen i production 

on 

output 

normalisation of Oi 

< 

m 

o 

to 

output 

divide along morphogen i gradient 

d n 

output 

divide along nutrient gradient 

a 

output 

apoptosis 

Q 

output 

quiesence 

s 

output 

solidify: no new adhesion 

st 

output 

threshold for 8 activation 

pd 

output 

perpendicular division 


Table 1: Fist of our artificial grn inputs and outputs proteins. 

ents sensed by the cell in its surrounding environment. The 
current nutrients level (n) input is the actual current level 
of nutrients in the cell. The same goes for the light intensity 
sensed by the cell (q) and the current amount of light energy 
accumulated in it (/). 

The cells express their choices between division, quies¬ 
cence or apoptosis through the concentrations of the out¬ 
put proteins di, a and q respectively. The protein with the 
biggest concentration represents the cell’s choice. In ad¬ 
dition to starting a division, the di outputs proteins of the 
aGRN also controls the cells’ division plane: each di out¬ 
put protein corresponds to a morphogen, and the di or d n 
protein with maximum concentration is used to determine 
the gradient (morphogen or nutrient) along which the cell 
must divide. If no gradient of said morphogen or nutrient 
is present, the axis of division is randomly chosen. The pd 
protein allows the cell to choose between a division along 
the morphogen gradient or perpendicular to it when the con¬ 
centration of protein pd is greater than the concentration of 
the selected division protein. 

The solidify output protein 8 controls the solidify capacity 
of a cell: if the concentration of protein s rises above the 
threshold protein st, the cell solidifies and will not accept 
any more adhesion from not yet connected cells until the 
concentration of protein s decreases again. 

To obtain a usable GRN, both the protein tags and the 
dynamics coefficients need to be optimized. The next part 
presents the specificities of the genetic algorithm used in this 
work. 

Evolution 

One of the goals of this experiment is to explore how arti¬ 
ficial multicellular organisms could survive in a harsh envi- 
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ronment without explicitly being led toward a given strategy 
or morphology. We want to explore the organisms that the 
rules of this world could create without constraining the cre¬ 
ativity of evolution through some restrictive objective func¬ 
tion. Therefore, the only objective for the cells is to sur¬ 
vive as long as they can, or more precisely, to keep at least 
one copy of their DNA in our virtual world for as long as 
possible. This gives full freedom to the cells on the de¬ 
velopmental strategies they can use and opens a wide range 
of possibilities of morphologies the organisms can develop. 
The drawback is that it also dramatically increase the search 
space and fills it with many local optima that pave the way 
to increased longevity. The evolutionary algorithm we use 
in this work to evolve the aGRN is based on the Gene Reg¬ 
ulatory Network Evolution through Augmenting Topology 
algorithm (GRNEAT) (Cussat-Blanc et al., 2015). 

GRNEAT 

In this algorithm inspired by the NEAT algorithm (Stanley 
and Miikkulainen, 2002) and adapted to evolve gene regu¬ 
latory network, the first population of aGRNs is initialized 
with small topologies, containing only input and output pro¬ 
teins. The population is evaluated standardly with a fitness 
promoting survival time. After a 3-player tournament se¬ 
lection, offsprings are crossed over using a protein align¬ 
ment operator. This operator uses a genetic distance metric 
to compute topological distances between two aGRN pro¬ 
teins. Each type of proteins is processed separately. Both 
the input and the output proteins are treated with the same 
method. One of each input (or output) protein linked to a 
sensor (or an actuator) is randomly selected from one of the 
parents. The regulatory proteins are then aligned before be¬ 
ing crossed: for each regulatory protein p\ from the first 
parent, the closest regulatory protein p 2 not yet aligned is 
selected from the second parent. The distance between two 
proteins is computed as follows: 

D(A , B) = - {a\id,A ~ ids | + b\enhA — enhsl + 

c\inhA — inks |) 

where id x is the tag, enh x is the enhancer tag and inh x is 
the inhibiter tag of protein x and p is the precision of the 
aGRN. a = 0.75, b = 0.125 and c = 0.125 are constants 
that weight each part of the protein properties. If the dis¬ 
tance D(p},p 2 ) is lower than a given alignment threshold 
<r a , both proteins are aligned. Once alignment of all proteins 
has been attempted, one protein of each aligned pair is ran¬ 
domly selected and added to the offspring. The regulatory 
proteins that failed to align in both parents are also added to 
the offspring. This ensures that no crucial genetic material 
is deleted during the crossover. Finally, the dynamics coeffi¬ 
cients are also crossed. One of the (3 and the 5 coefficients is 
randomly selected from the parent genomes and used in the 
offspring genome. 


Crossed-over aGRNs represent 30% of the offsprings. 
The rest of the offsprings are built using tournament selected 
genomes from the previous generation. All offsprings ex¬ 
cept the elite (the best genome) are then subject to mutation 
with a 75% rate. When mutated, a genome can be modi¬ 
fied in three different ways: (1) delete a protein, with a 15% 
probability, randomly select a regulatory protein, if any, that 
is removed from the aGRN; (2) add a protein, with a 15% 
probability, adds a randomly generated regulatory protein; 
(3) modify a protein, with 70% probability, randomly mod¬ 
ify exactly one parameter of the aGRN, either one protein 
tag or one of the dynamics coefficients. 

Novelty metrics 

In order to try to mitigate the adverse effects of increased 
degrees of freedom and numerous local optima in the mor¬ 
phological parameter space, we added a novelty metric as 
defined in (Lehman and Stanley, 2008). We combined this 
novelty score with our main survival objective by modifying 
the selection phase of our genetic algorithm: each poten¬ 
tial parent is selected through a tournament based on either 
novelty or survival time, with a 50% chance. While not as 
complex as some other integrations of novelty in a multi¬ 
objective genetic algorithm (Mouret, 2011), this proved suf¬ 
ficient to harness some of the exploratory power of novelty. 
In this experiment, we tried three different novelty metrics, 
which are based on the capture (and comparison) of various 
aspects of a developing phenotype: 

• Nmo is composed of three numbers: the maximum num¬ 
ber of cells during the simulation, the maximum depth 
reached by a cell and the total survival time (which is also 
the main objective). 

• Nm\ is composed of 5 snapshots of the simulation (at 
times t = 10, t = 20, t = 50, t = 75 and t = 100). Each 
snapshot contains 2 numbers: the number of cells and the 
maximum depth of a cell at the time of the capture. 

• Nm 2 is a set of 5 captures (taken at the same time steps 
as for Nm\) represented as a 20 x 20 integer matrix. It is 
actually a set of pictures in which each pixel’s value rep¬ 
resents the number of cells stacked. The plane of the shot 
is determined through a Principal Component Analysis on 
the cells position (it is the most discriminant plane). This 
metric is meant to capture the morphologies of the organ¬ 
isms in all their subtleties 

Results 

Influence of novelty 

In Figure 1, we can see the median (with first and third quar- 
tile) survival times of the best genomes evolved during 300 
generations in 10 independent runs. On a 2014 high-end 
laptop, the average evaluation time for an individual was of 
0.2s during the first 5 generations and ended at an average of 
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Survival 

- 

0.002 

0.011 

0.089 

Nmo 

0.002 

- 

0.360 

0.050 

Nmi 

0.011 

0.360 

- 

0.250 

Nm 2 

0.089 

0.050 

0.250 

- 


Table 2: p-values of the paired Student t-test run comparison 
between runs with survival fitness and the different novelty 
measures calculated on 10 runs at generation 300. 


Figure 1: Error bar plots of the best individuals obtained on 
10 independent runs. Error bars represents the median, the 
first and third quartiles. All novelty objectives are obviously 
helping to escape local optimum. However, the novelty mea¬ 
sure Nrrio is giving better results. The initial value of 41 ob¬ 
tained at generation 0 represents the survival time for a seed 
cell that stays quiescent during the whole simulation. 

1.3s. The best organisms obtained with these runs are pre¬ 
sented in Figure 2(a, c-h). This graph reveals both the de¬ 
ceptiveness of the fitness landscape when the survival time 
is used as only fitness objective as well the beneficial im¬ 
pact of novelty. This is undeniable (Student t-test p-values 
are provided in table 2): where a classical objective based 
evaluation struggles to find solutions that pass the first local 
optima (for example: not dividing and surviving on the ini¬ 
tial resources of the seed cell, or just doing a few divisions 
in order for some cells to reach the surface and bring in a 
little bit of light), the novelty based approaches successfully 
find solutions to overcome these optima and efficiently pave 
the way to more robust organisms. 

The three novelty measures tested in this experiment show 
that too much information loses the evolution in the vast 
search space: the novelty measure Nmo globally does bet¬ 
ter than both other measures. This measure is the one that 
includes the fewer parameters. In our opinion, when too 
much parameters are used to describe a phenotype, the ex¬ 
ploration space becomes too large and individual with mi¬ 
nor differences are considered too novel. Therefore, it is of 
high importance to wisely choose parameters that describe 
the phenotypes. As depicted in table 2, the relatively high 
p-values between novelty based runs reveal the necessity to 
make a broader study on the influence of the novelty param¬ 
eters in order to find the best possible measures for evo-devo 
models and validate our preliminary results. 

Developmental strategies and world setup influence 

Along all the evolutionary runs, we observed an important 
diversity of developmental strategies and morphologies, es¬ 
pecially when any form of novelty was involved. Figure 
2(a,b) shows examples of cells arrangements obtained with 
different worlds parameters. The distribution of nutrients in 
the world was also found to be of huge influence over the 
preferred strategies: as expected, large values of C n and P n 


favored a very vertical growth of the cell colony, with the 
formation of a relatively thick trunk in the ground enabling 
fast nutrient and light transfer between the deep roots cells 
and the emerged ones. One of the most interesting results 
might be the emergence of a form of reproduction through 
parthenogenesis when the nutrients concentration was uni¬ 
form. Cells indeed understandably found the benefits of a 
vertical growth to be incomparable with the efficiency of a 
vertical growth. They also adopted, as shown in Figure 2(b), 
a spread method where they would laterally develop just be¬ 
low the surface. When a root cell encountered a nutrient 
source, it would also divide upward (to the surface) and the 
cells between the two formed cluster would undergo apop¬ 
tosis, thus creating a simple form of parthenogenesis remi¬ 
niscent of the biological reproduction of some plants. 

Conclusion 

We have presented a new developmental model based on 
MecaCell, a physics engine tailored for artificial life experi¬ 
ments. This model shows how novelty search can help when 
stepping artificial embryogenesis up to the third dimension. 
Indeed, this allows for more degrees of freedom for the mul¬ 
ticellular organisms but also adds complexity for the cell 
controller to handle. As a result, this makes the search space 
much harder to explore with standard fitness function. In 
addition to the use of a 3D developmental model, we also 
wanted to remove all engineering from the main fitness ob¬ 
jective: it only targets to the survival duration of the organ¬ 
isms. By only using this objective, we showed that the evo¬ 
lution is stuck in one or few local optima, but by adding dif¬ 
ferent novelty metrics based on the organisms morphologies 
and capabilities to explore its environment, we showed that 
evolution can escape from these local optima and develop 
more complex morphologies and behaviors able to survive 
longer in the exact same environment. 

This new developmental model opens many research per¬ 
spectives. Firstly, we need to study more precisely the in¬ 
fluence of the environment parameters on the multicellular 
organisms. During the development of the presented exper¬ 
iment, one of the major difficulty was to produce a viable 
environment, easy enough to allow the organisms to grow 
but difficult enough to require complex behaviors. Balanc¬ 
ing this is difficult task and needs to be studied in detail. 
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Figure 2: Examples of organisms obtained with the different fitnesses, (a, c-h) survival only and novelty metrics Nmo , Nmi 
and Nrri 2 in the novelty impact study, (a, b) same fitness with different environmental conditions. 


Once done, we want to produce an artificial world in 
which different organisms would coexist, cooperating or 
competing for survival and reproduction. This will require 
specialization capacities of the cells in order to balance the 
capacities of the organisms, with for example a light extract¬ 
ing cell type and a reproductive one. We hope to produce 
more complex organisms, further mimicking some aspects 
of the early stages of the appearance of life on earth. 
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Abstract 

In swarm robotics simple identical robots have to be made 
to coordinate in such a way that they can perform a task. 
Multi-cellular organisms similarly during development have 
to be able to create spatial patterns using many identical com¬ 
ponents (cells) without being able to draw on an absolute 
frame of reference. Finding and understanding existing so¬ 
lutions to the latter problem might therefore be a promising 
route to solving the former. Cell-surface mechanics, i.e. cell 
movement based on surface tension or adhesion is a mecha¬ 
nism that is known to be involved in many basic processes of 
morphogenesis. We implemented a simplified model of cell 
surface mechanics on the kilobot, a small robot with limited 
computational power and without any spatial orientation ca¬ 
pabilities. Using only distance measurements to their neigh¬ 
bours kilobots were able to perform various morphogenetic 
tasks. 

Introduction 

In the last decades miniaturization and efficiency increases 
in processor and battery technology have led to the rise of 
swarm robotics (Brambilla et al., 2013). It is built on the 
promise that for many tasks a single complicated and ex¬ 
pensive custom robot can be replaced by a potentially more 
robust swarm of simple and cheap off-the-shelf units (Barca 
and Sekercioglu, 2012). 

Kilobots have been developed as a new platform with the 
explicit aim to make swarm robotics affordable for individ¬ 
uals or institutions with a limited budget (Rubenstein et al., 
2012). Accordingly their design prioritizes simplicity and 
low hardware costs resulting in very limited capabilities. 

While great advances have been made on the hardware 
side, programming robot swarms is still a challenge (Ruben¬ 
stein et al., 2014). Swarm members have to differentiate 
from essentially identical initial states to a structured entity 
where different units or group of units perform different ac¬ 
tivities, solely based on local communication. 

A similar challenge is faced by most multicellular organ¬ 
isms during ontogeny. Starting with identical units of lim¬ 
ited complexity that only have access to local information 
and communication they have to create spatial and temporal 
patterns in order to develop differentiated tissues and organs. 


Mixed masses of animal cells of different tissue types will 
over time rearrange leading to a number of different spatial 
configurations (Glazier and Graner, 1993). This sorting pro¬ 
cess is based on properties of the cells and their membranes. 
Although its biophysics are not completely understood, phe¬ 
nomenologically it can be modelled with a high degree of 
accuracy (Brodland, 2004; Maree et al., 2007). 

In the past, various attempts have been made to translate 
biological mechanisms of pattern formation into robots (e.g. 
Zahadat et al., 2013). Here we will present an implementa¬ 
tion of a cell surface mechanics-like system on the kilobot 
platform enabling basic morphogenetic processes. 

Methods 

Cell surface mechanics is based on the idea that interfaces 
between cells differ in their adhesion and/or surface tension 
depending on the types of the cells on both sides. In anal¬ 
ogy to the formalisms applied e.g. in modelling foams, the 
free energy of a given membrane configuration can then be 
calculated. Local thermodynamic fluctuations in membrane 
state will produce stochastic changes in membrane configu¬ 
ration with a corresponding change in free energy. Assum¬ 
ing that changes to a lower energy state are more likely, the 
system as a whole will tend towards a decrease in free en¬ 
ergy. Depending on the combination of cell types three main 
equilibrium patterns have been found: Cells will arrange in 
a mixed or checkerboard pattern, they can separate by type 
into two clusters or one cell type can form a layer engulfing 
the other one (Glazier and Graner, 1993). 

Various different types of formalisms with different de¬ 
grees of abstraction have been used to model cell surface 
mechanics (see review by Brodland, 2004). Some models 
track microscopic changes in membrane configuration and 
thus enable large scale changes in cell shape, others operate 
on the level of entire cells and provide less flexibility. 

For our system we chose single cells as the basic unit 
of abstraction. We let each bot represent the centre of a 
cell with a circular ’’virtual cell” of a fixed radius extending 
around it. If the virtual cells of two bots overlap we assume 
they form an interfacial membrane on the line between the 
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Figure 1: Robot positions after 50k seconds for mixing (left, 
Ji,i = 8, i/ 2,2 = 8, i /12 = 2), separation (middle, 2, 2, 8) 
and engulfment (right, 2, 12, 8). Jm = 10 in all cases. 



seconds seconds seconds 


Figure 2: Sortedness or degree of engulfment, respectively, 
over time in 10 random replicates for mixing (left), separa¬ 
tion (middle) and engulfment (right). 


two intersection points of their circular cell surfaces. We 
use the standard formulation of the Hamiltonian to calculate 
energy for a given set of bot positions and resulting mem¬ 
brane configuration. Given segment lengths surface en¬ 
ergy constant J and cell types at segment i, r(i) and r'(i ), 
respectively, we obtain: H = (i)) ■ li 

Kilobots can measure the distance to their neighbours, 
but are not able to determine their own or their neighbours’ 
absolute position. We therefore approximate the degree to 
which the overlap between neighbouring interface segments 
reduces overall cell surface by calculating segment length as 
^ — h, orig/(l E ^orig/ c )- 

Minimization of the energy state of the system happens 
by means of a Monte Carlo process: Each robot measures 
distances to its neighbours and from that calculates its cur¬ 
rent surface configuration. Then it performs a random move¬ 
ment step (within the limits of the kilobots’ locomotion), 
after which it measures distances again. Based on this infor¬ 
mation the change in surface lengths and consequently the 
change in energy is calculated. If the result is favourable 
(i.e. H' < H) the robot maintains its position. Otherwise it 
moves back to its starting point. 

In simulations of real cells different equilibrium states of 
a cell cluster will be reached depending on specific com¬ 
binations of values of the surface energy constants J(a, b ) 
(Glazier and Graner, 1993). Using a freely available, accu¬ 
rate kilobot simulator (Jansson et al., 2015) we tested values 
of J corresponding to each of the end configurations mixing , 
separation and engulfment. We simulated 200 robots (100 
per cell type) moving for 50,000 simulated seconds starting 
from random initial positions. 


Results 

For all three configurations ( mixing , separation and engulf¬ 
ment) the spatial configuration of the robots clearly corre¬ 
sponds to the expected pattern (see fig. 1). Emergence of the 
patterns is robust against starting conditions (fig. 2) 

Conclusion 

We have successfully implemented a key morphogenetic 
process in a low capability swarm robot system. Despite the 
high stochasticity and the lack of information on absolute 
or relative position membrane-based changes in local poten¬ 
tial energy lead to large-scale patterning in the robot swarm 
analogous to those observed in real tissues. 

References 

Barca, J. C. and Sekercioglu, Y. A. (2012). Swarm robotics 
reviewed. Robotica , 31(July 2012): 1-15. 

Brambilla, M., Ferrante, E., Birattari, M., and Dorigo, M. 
(2013). Swarm robotics: a review from the swarm en¬ 
gineering perspective. Swarm Intelligence , 7(1): 1-41. 

Brodland, G. W. (2004). Computational modeling of cell 
sorting, tissue engulfment, and related phenomena: A 
review. Applied Mechanics Reviews , 57(1):47. 

Glazier, J. a. and Graner, F. (1993). Simulation of the dif¬ 
ferential adhesion driven rearrangement of biological 
cells. Physical Review E , 47(3):2128-2154. 

Jansson, F., Hartley, M., Hinsch, M., Slavkov, I., Carranza, 
N., Olsson, T. S. G., Dries, R. M., Gronqvist, J. H., 
Maree, A. F. M., Sharpe, J., Kaandorp, J. A., and 
Grieneisen, V. A. (2015). Kilombo: a Kilobot simulator 
to enable effective research in swarm robotics. CoRR , 
abs/1511.0. 

Maree, A. F. M., Grieneisen, V. a., and Hogeweg, R (2007). 
The Cellular Potts Model and biophysical properties 
of cells, tissues and morphogenesis. In Anderson, A. 
R. A., Chaplain, M. A. J., and Rejniak, K. A., editors, 
Single-cell-based models in Biology and Medicine , 
pages 107-136. Basel. 

Rubenstein, M., Ahler, C., and Nagpal, R. (2012). Kilobot: 
A low cost scalable robot system for collective behav¬ 
iors. In Proceedings - IEEE International Conference 
on Robotics and Automation, pages 3293-3298. 

Rubenstein, M., Cornejo, a., and Nagpal, R. (2014). Pro¬ 
grammable self-assembly in a thousand-robot swarm. 
Science , 345(6198):795-799. 

Zahadat, R, Schmickl, T., and Crailsheim, K. (2013). Evolu¬ 
tion of Spatial Pattern Formation by Autonomous Bio- 
Inspired Cellular Controllers. Advances in Artificial 
Life, ECAL 2013 , pages 721-728. 


369 















The Relationship between Microscopic and Collective Properties 
in Gene Regulatory Network-based Morphogenetic Systems 

Hyobin Kim 1,2 and Hiroki Sayama 1,2 

1 Department of Systems Science and Industrial Engineering 
2 Center for Collective Dynamics of Complex Systems 
Binghamton University, State University of New York, Binghamton, NY, USA 
hldm240@binghamton.edu 


Abstract 

Gene regulatory network (GRN)-based morphogenetic systems 
have recently attracted an increasing attention in artificial life and 
morphogenetic engineering research. However, the relationship 
between microscopic properties of intracellular GRNs and 
collective properties of morphogenetic systems has not been fully 
explored yet. Thus, we propose a new GRN-based framework to 
elucidate how critical dynamics of GRNs in individual cells 
affect cell fates such as proliferation, apoptosis, and 
differentiation in resulting morphogenetic systems. Our model 
represents an aggregation of cells, where each cell has a GRN in 
it. We used Kauffman’s NK Boolean networks for GRNs. 
Specifically, we randomly assigned three cell fates to the 
attractors. Varying the properties of GRNs from ordered, through 
critical, to chaotic regimes, we observed the process that cells are 
aggregated. We found that the criticality of a GRN made an 
optimal partition of basins of attraction, which led to a maximum 
balance between cell fates Based on the result, we can conclude 
that the criticality of a GRN is an important controller to 
determine the frequencies of cell fates in morphogenetic systems. 

Gene regulatory network (GRN)-based morphogenetic systems 
have been actively developed and their properties have been 
studied in artificial life and morphogenetic engineering 
(Doursat, 2008; Schramm, et al. 2012). However, the 
relationship between microscopic properties of intracellular 
GRNs and collective properties of morphogenetic systems has 
not been frilly explored yet. 

Here, we study the relationship between the critical dynamics 
of GRNs in cells and cellular functions performed in 
morphogenetic systems at a collective level. We used 
Kauffman’s NK Boolean network as a model of GRNs 
(Kauffman, 1969, 1993, 1996). In NK Boolean networks, a 
dynamic attractor can be considered as a cellular function or a 
cell type. Thus, staying in different attractors can be interpreted 
as a dynamical representation of the cellular function. There 
exists much experimental evidence to support this view of 
cellular dynamics (Huang et al. 2005; Chang et al. 2008). Based 
on this view, Huang explained stochastic and reversible 
switching between cell fates using NK Boolean networks 
(Huang, 1999; Huang and Ingber, 2000). 

Extending Huang’s conceptual framework, we implemented 
NK Boolean network-based morphogenetic systems. In our 
model, we assumed that a cell has three fundamental cellular 
functions: proliferation, apoptosis, and differentiation. Our 


model represents an aggregation of cells, where each cell has an 
identical random NK Boolean network which consists of 20 
nodes. By adjusting in-degree (K) of nodes of a GRN in the 
model, we can obtain various properties of GRNs from ordered, 
through critical, to chaotic regimes; K=1 is ordered, K=2 is 
critical, and K>2 is chaotic [3,4,5]. We generated random 
GRNs from K=1 to K=4. For each GRN, we randomly chose 
one attractor and assigned it the cellular function of 
proliferation. If there is any other attractor available, we chose 
another attractor randomly and assigned it the cellular function 
of apoptosis. If there is still any attractor available, we chose an 
attractor randomly and assigned it the cellular function of 
differentiation. Tins means that, if a GRN has only one 
attractor, it conducts only proliferation. If it has two attractors, 
it performs proliferation and apoptosis. With three or more 
attractors, all the cell fates assumed in the model can take place. 
Fig. 1(a) is a schematic diagram that shows three cell fates 
randomly assigned in a GRN which has more than three 
attractors. 

Jumping from one cell fate to another may occur in every 
time step by perturbations in internal gene expression caused by 
cell-cell interactions within a morphogenetic system. 
Specifically, cells interact with one another through the 
transport of signal molecules between the environment and 
cells. The transport occurs through diffusion by the 
concentration difference of signal molecules. If the 
concentration of a signal molecule is beyond a certain 
threshold, it can control the expression of assigned genes. 

Our morphogenetic model starts from one seed cell. The 
change of concentrations of signal molecules by diffusion leads 
to the change of gene expression. The altered gene expression 
finally converges to one attractor. If the converged attractor is 
proliferation, the cell is divided into two at the next time step. 
One is mother cell and the other is daughter one. Two cell share 
the half concentrations of signal molecules of the mother cell 
before division and they have the same GRN. With the process 
of proliferation, aggregated cells are composed of all the same 
GRNs. If the converged attractor is apoptosis, the cell dies. 
Once the cell becomes dead, it remains as it is for every time 
step. This is to examine how apoptosis has an influence on 
morphology based on the biological fact that apoptotic cell 
death contributes to cell morphology (e.g. the separation of 
fingers and toes in development). Last, if the converged 
attractor is differentiation, the cell is regarded as differentiated. 
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Figure 1: (a) Schematic diagram of randomly assigned three cell fates in a GRN which has more than three attractors. Each node represents a cell’s 
dynamical state. Orange nodes are attractors, (b) Average basin entropy for K = 1-4. (c) Average states entropy of cell fates performed in simulations 
at each time step for K=l-4. 


In our model, we assumed that cells staying in proliferation and 
differentiation states continue to switch between cell fates. 

To investigate the structure of basins for cell fates according 
to the properties of GRNs, we applied revised basin entropy, 
using log base two. Basin entropy is a measure of the 
complexity of information that a system is capable of storing 
(Krawitz and Shmulevich, 2007). In the context of GRNs, the 
basin entropy represents the effective functional versatility of 
the cell. Originally, Krawitz's basin entropy is computed 
considering all the attractors and their basins. Meanwhile, 
focusing on the basins into which three cell fates are assigned, 
we calculated the values of basin entropy based on relative sizes 
of the basins. Fig. 1(b) shows the average of basin entropy for 
cell fates from K=1 to K=4. The average basin entropy is 
highest at K=2, i.e., the GRNs’ basins of three cell fates are 
most evenly distributed at K=2. 

For each group, we conducted 100 independent 
computational simulations of morphogenetic cell growth 
processes on a 2D spatial grid for t=0-4. By counting the 
numbers of cells expressing proliferation, apoptosis, or 
differentiation state at each time step in those simulations, we 
obtained the average values of cell states entropy based on 
relative frequencies (see Fig. 1(c)). Because there are too small 
number of cells at t=l to capture distinct differences with K, we 
excluded a graph for the average values at t=l. As seen in Fig. 
1(c), the cell states entropy is the highest at K=2 for t=2-4, 
which means when GRNs are critical, cells are aggregated 
remaining the most balanced three cell fates at each time step. 
The result confirmed that GRNs at K=2 have a maximum 
balance between cell fates. In the evolutionary view, the 
maximum balance has a significant implication; in any 
environmental changes, key cellular functions, such as 
proliferation, apoptosis, and differentiation, must be maintained 
in balance. Because those cellular functions are expressed by 
the attractors of a GRN in a cell, if the basins of cellular 
functions are distributed more evenly, the cell fates can better 
persist against environmental changes, which may work as a 
selective advantage in the process of evolution. 


Our finding suggests that the criticality of a GRN may play 
an important role in modulating the frequencies of cell fates in 
morphogenetic systems. To obtain more theoretical/ empirical 
support for this suggestion, we plan to conduct large-scale 
evolutionary simulations of the ecologies of morphogenetic 
systems where the evolutionary success of multicellular 
organisms is determined by implicit fitness functions. 

This material is based upon work supported by the US 
National Science Foundation under Grant No. 1319152. 
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Abstract 

Many effective and innovative survival mechanisms used by 
natural organisms rely on the capacity for phenotypic plastic¬ 
ity; that is, the ability of a genotype to alter how it is expressed 
based on the current environmental conditions. Understand¬ 
ing the evolution of phenotypic plasticity is an important step 
towards understanding the origins of many types of biolog¬ 
ical complexity, as well as to meeting challenges in evolu¬ 
tionary computation where dynamic solutions are required. 
Here, we leverage the Avida Digital Evolution Platform to 
experimentally explore the selective pressures and evolution¬ 
ary pathways that lead to phenotypic plasticity. We present 
evolved lineages wherein unconditional traits tend to evolve 
first; next, imprecise forms of phenotypic plasticity often ap¬ 
pear before optimal forms finally evolve. We visualize the 
phenotypic states traversed by evolved lineages across envi¬ 
ronments with differing rates of mutations and environmental 
change. We see that under all conditions, populations can fail 
to evolve phenotypic plasticity, instead relying on mutation- 
based solutions. 

Introduction 

Phenotypic plasticity is the capacity for a genotype to ex¬ 
press different phenotypes in response to different environ¬ 
mental conditions (Ghalambor et al., 2010) and is ubiqui¬ 
tous throughout nature. The capacity for phenotypic plas¬ 
ticity is central to many complex traits and developmental 
patterns found in nature and often serves as a key strategy 
employed by organisms to respond to spatially and tempo¬ 
rally variable environments (Bradshaw, 1965; Murren et al., 
2015). For example, Daphnia pulex use plasticity to dif¬ 
ferentially invest in morphological defenses during develop¬ 
ment, depending on the presence of predators in their lo¬ 
cal environment (Black and Dodson, 1990). Genetically 
homogeneous cells in a developing multicellular organism 
leverage their capacity for phenotypic plasticity to coordi¬ 
nate their expression patterns through environmental signals 
(Schlichting, 2003). Thus, understanding the evolution of 
plasticity is an important step toward a deeper understand¬ 
ing of biological complexity. 

Phenotypic plasticity also has practical applications in the 
field of evolutionary computation where evolution by natu¬ 
ral selection is harnessed to solve challenging computational 


and engineering problems. In many realistic problem do¬ 
mains, conditions are noisy or cyclically change. Plasticity 
could enable solutions to dynamically respond to changing 
problem conditions and be robust to noise. Both the bio¬ 
logical and evolutionary computation domains motivate the 
following questions: (1) Under what conditions does pheno¬ 
typic plasticity evolve? And (2), what are the evolutionary 
stepping stones for phenotypic plasticity? 

Ghalambor et al. identify four conditions that are neces¬ 
sary for phenotypic plasticity to evolve: (1) populations are 
exposed to temporally or spatially varying environments, (2) 
the environments are differentiable by reliable signals, (3) 
different environments favor different phenotypes, and (4) 
no single phenotype can exhibit high fitness across all en¬ 
vironments (Ghalambor et al., 2010). Theoretical and em¬ 
pirical findings support that phenotypic plasticity can evolve 
under these conditions in both natural and artificial systems 
(Clune et al., 2007; Goldsby et al., 2010, 2014; Hallsson and 
Bjorklund, 2012; Nolfi et al., 1994). 

In addition to exploring the conditions that facilitate the 
evolutionary origin of phenotypic plasticity, it is also impor¬ 
tant to explore the step-by-step process in which plasticity 
actually evolves. What are the reoccurring themes as evo¬ 
lution progresses toward more plastic strategies? Are there 
genotypic or phenotypic patterns present in lineages leading 
to phenotypically plastic organisms? These types of ques¬ 
tions are especially difficult to address in laboratory sys¬ 
tems due to the slow pace of natural evolution, imperfec¬ 
tions in lineage tracking, and the difficulty of acquiring high- 
resolution data on genotypes and phenotypes. As such, arti¬ 
ficial life systems are the most effective way to observe and 
analyze the process by which phenotypic plasticity evolves. 

Here, we use the Avida Digital Evolution Platform (Ofria 
et al., 2009) to explore the process by which phenotypic 
plasticity evolves in a fluctuating environment. We exper¬ 
imentally address two questions related to the evolution of 
phenotypic plasticity. First, do digital organisms evolve to 
express traits unconditionally before evolving to condition¬ 
ally express them as a function of their environment, and 
do sub-optimal forms of plasticity evolve before more opti- 
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mal forms of plasticity? Second, how do mutation rate and 
environmental fluctuation rate affect the evolution of phe¬ 
notypic plasticity? We also examine alternative evolution¬ 
ary strategies to phenotypic plasticity in fluctuating environ¬ 
ments and see evidence for bet-hedging strategies that use 
mutationally induced phenotype switching as a substitute for 
sensory-dependent plasticity. 

Methods 

The Avida Digital Evolution Platform 

The Avida software provides a computational instance of 
evolution and enables researchers to experimentally test hy¬ 
potheses about evolution that would otherwise be difficult 
or impossible to test in natural systems (Ofria et al., 2009). 
Avida has been demonstrated to have a robust genetic en¬ 
coding; all possible genetic sequences are well-defined in 
any context (Ofria et al., 2009). Avida has also been shown 
to be capable of evolving to use a wide range of capabil¬ 
ities (Bryson and Ofria, 2013), making it an ideal choice 
for studying phenotypic plasticity. Here, we provide a brief 
overview of Avida as it is relevant to this work. 

Digital Organisms Populations in Avida are made up of 
self-replicating computer programs that compete for space 
in a finite, toroidal grid. Each of these digital organisms is 
defined by a sequence of instructions (i.e. its genotype), vir¬ 
tual hardware to execute the instructions, and a position on 
the grid. The instruction set of Avida is Turing-Complete 
and enables organisms to perform basic computations, con¬ 
trol their own execution flow, and replicate. An organism’s 
virtual hardware (Figure 1) includes components such as a 
central processing unit (CPU), registers used for computa¬ 
tion, input and output buffers, and memory stacks. Organ¬ 
isms replicate asexually by copying themselves line-by-line 
and dividing; however, an organism’s copy instruction is im¬ 
perfect, which can result in mutated offspring. 

Organisms can gain additional CPU cycles by perform¬ 
ing tasks - such as mathematical computations - to im¬ 
prove their metabolic rate. An organism’s metabolic rate 
determines how rapidly it can execute its genome; a higher 
metabolic rate allows an organism to replicate faster. Ini¬ 
tially, an organism’s metabolic rate is roughly proportional 
to its genome length; however, the organism’s metabolic rate 
can be adjusted when the organism completes a task. In this 
way, performance of tasks can be differentially rewarded or 
punished. When an organism successfully replicates, its off¬ 
spring is placed in a random location in the world, replac¬ 
ing the organism formerly occupying that location. In this 
way, becoming a more efficient replicator in Avida is ad¬ 
vantageous in the competition for space. The combination 
of competition for replication efficiency and heritable varia¬ 
tion due to imperfect copying during the replication process 
results in evolution by natural selection. 


Sensing in Avida In a typical Avida run, organisms must 
execute an instruction called IO to output the result of a com¬ 
putation. That output is analyzed to determine if any tasks 
have been performed, and if so, the organism is appropri¬ 
ately rewarded or punished. However, in this default sce¬ 
nario, organisms cannot sense the result, even after the task 
has been performed. To provide organisms with a mecha¬ 
nism to sense their environment, we added an IO-Sense in¬ 
struction to the set of available instructions 1 . 

The IO-Sense instruction simulates IO and provides the 
organism with feedback on what would have happened if 
the organism had executed an IO instruction instead. This 
separation of IO performance and sensing allows organisms 
to determine whether or not a particular task is being pun¬ 
ished without the risk of punishment, lowering the potential 
cost of sensing. If an IO operation would have resulted in a 
punishment, a -1 is added to the top of the organism’s stack 
memory; if it would have resulted in a reward, a 1 is placed 
there. If an IO operation would have resulted in neither a re¬ 
ward nor a punishment, a 0 is placed on the organism’s stack 
memory. In this way, organisms are able to sense whether 
or not a particular computational task is being rewarded or 
punished in their current environment and are able to react 
accordingly. 

Identifying Phenotypic Plasticity in Avida We define a 
phenotypically plastic organism in Avida as an organism that 
leverages sensory information to alter the phenotype that 
they express based on the environment they are in. We re¬ 
strict the definition of an organism’s phenotype to the set of 
unique tasks it performs in the target environment. We don’t 
consider how many times an organism performs a task in a 
given environment, but only whether the organism does the 
task at all. Thus, to be phenotypically plastic, an organism 
must express a different task profile - perform different tasks 
- in different environments. 

Experimental Design 

To explore the evolutionary history of phenotypically plastic 
organisms, we used an experimental design based on (Clune 
et al., 2007). 

Environments We constructed two experimental environ¬ 
ments named ENV-NAND and ENV-NOT. In ENV-NAND, 
organisms were rewarded for performing the NAND logi¬ 
cal task but were punished for performing the NOT logical 
task. Conversely, in ENV-NOT, organisms were rewarded 
for performing the NOT logical task but were punished for 

1 IO-Sense is based on the IO-Feedback instruction imple¬ 
mented in (Clune et al., 2007), which worked exactly as the de¬ 
fault IO instruction, but provided the organism with feedback on 
the result. Thus, an organism must first do a particular task once - 
and potentially get punished - to sense whether or not the task is 
beneficial. 
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Figure 1: A visual representation of the default virtual hardware used by organisms in Avida. Original figure from: (Ofria et al., 2009) 


performing the NAND logical task. In each of our exper¬ 
imental treatments, we cycled between these two environ¬ 
mental conditions. In this way, genotypes with the capacity 
to sense the current environment and express the appropriate 
task had a competitive advantage over phenotypically non¬ 
plastic organisms. 

Phenotypes Given our simple definition of a phenotype, 
there are only four possible phenotypes in each of the 
two previously described environments: (1) perform only 
NAND, (2) perform only NOT, (3) perform both NAND and 
NOT, and (4) perform neither NAND nor NOT. When con¬ 
sidering an organism’s phenotype across both ENV-NAND 
and ENV-NOT, there are sixteen possible combinations. We 
enumerate these phenotypes in Figure 2. Of these sixteen 
possible phenotypes, only four express the identical task 
profile in both environments; the other 12 all exhibit some 
form of plasticity. The optimal form of plasticity is to per¬ 
form only the NAND task in ENV-NAND and to perform 
only the NOT task in ENV-NOT; any other form of plastic¬ 
ity is sub-optimal. There are five possible phenotypes that 
leverage plasticity to perform punished tasks instead of re¬ 
warded tasks in a given environment; we did not expect these 
forms of phenotypic plasticity to be successful. 

Treatments Our experimental design consisted of five 
treatments and a control: (1) a baseline treatment with 
a moderate point-mutation rate and environmental-cycle 
length, (2) a low-mutation-rate treatment, (3) a high- 
mutation-rate treatment, (4) a short-environment-cycle- 
length treatment, (5) a long-environment-cycle-length treat¬ 
ment, and (6) a control where both NAND and NOT were 
rewarded and the environment did not fluctuate. See Table 1 
for treatment details. 

We created the baseline treatment to produce phenotyp- 


Treatment 

Point-mutation 

Rate 

Environment 
Cycle Length 

Baseline 

0.0075 

100 updates 

Low Mutation Rate 

0.0025 

100 updates 

High Mutation Rate 

0.0125 

100 updates 

Short Environment 
Cycle Length 

0.0075 

50 updates 

Long Environment 
Cycle Length 

0.0075 

200 updates 


Table 1: Differences among the five experimental treatments. 
Point-mutation rate is given as mutations per instruction copied. 
Environment cycle length describes the length of time (in updates) 
an environment is active before toggling to the alternative environ¬ 
ment. 


ically plastic organisms for lineage analysis. We limited 
the population size to 3600 organisms and seeded the world 
with an ancestral genotype capable only of self-replication. 
We then evolved populations for 100,000 updates 2 in Avida. 
We imposed a 0.0075 probability of point-mutation per in¬ 
struction copied, as well as a 0.05 probability for each of 
single-instruction insertion and deletion per genome copied. 
We fluctuated the current environment between ENV-NAND 
and ENV-NOT every 100 updates in the baseline treatment. 
We ran 50 replicates of each treatment, including the con¬ 
trol. 


2 An update in Avida is an experimental length of time. One 
update is defined as the amount of time it takes for the average 
organism to execute 30 instructions (see (Ofria et al., 2009) for 
more details). 
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Results and Discussion 



Figure 2: Enumeration of all possible complete phenotypes. Each 
row represents a distinct phenotype. A green ‘X’ indicates that the 
associated task is performed in the specified environment, while a 
red indicates that the task is not performed. For each environ¬ 
ment, the column of the rewarded task is highlighted in green, and 
the column of the punished task is highlighted in red. A green ‘X’ 
in a green column or a red in a red column is optimal. Each 
phenotype has a color code, which is used in our visualization tool. 
Note that the first four rows are non-plastic phenotypes, rows 5-8 
exhibit partially beneficial plasticity, and row 9 is optimally bene¬ 
ficial. Rows 10-11 are mostly neutral, while rows 12-16 are detri¬ 
mental forms of plasticity. 


Lineage Visualization To explore evolutionary strategies 
evolved in fluctuating environments, we visualized the lin¬ 
eages of evolved genotypes as vertical bars where time (in 
updates) proceeds from top to bottom beginning with the 
lineage’s original ancestor genotype. Any given genotype 
on the lineage must express one of the sixteen possible phe¬ 
notypes enumerated in Figure 2. At each point in time, 
the color of the visualized lineage corresponds to the color 
representing the phenotype expressed by the lineage at that 
point in time. For example, because the ancestral organ¬ 
ism is capable only of self-replication, all visualized lin¬ 
eages should show that the original ancestor’s phenotype 
performed neither the NAND task nor the NOT task. In 
addition to the visualized lineages, we indicate the actual 
environmental conditions experienced by the evolving pop¬ 
ulations at each point in time by the color of the vertical 
axis. This type of visualization allows us to display the phe¬ 
notypic states traversed by any given lineage, which allowed 
us to explore evolutionary strategies leveraged by all evolved 
lineages. 


What conditions promote the evolution of 
phenotypic plasticity? 

Ghalambor et al. identified four environmentally-dependent 
requirements for the evolution of phenotypic plasticity 
(Ghalambor et al., 2010), and our experimental design con¬ 
forms to these conditions, enabling us to test their valid¬ 
ity and relative importance. The oscillation between ENV- 
NAND and ENV-NOT provides temporal variation. The 10- 
Sense instruction reliably indicates the current environment. 
The two environments favor opposing phenotypic traits, and 
the only way for an individual organism to achieve a high 
fitness in both is to alter its phenotypic expression. Given 
the existing theoretical and empirical support for these con¬ 
ditions, we expected to see the evolution of phenotypic plas¬ 
ticity in each of our experimental treatments. However, we 
were unsure of the impact of altering environmental factors 
such as mutation rate and environment fluctuation rate. 

At the end of the experiment, we extracted the dominant 
(most abundant) genotype from the population of each repli¬ 
cate. We tested these genotypes in both ENV-NAND and 
ENV-NOT and recorded each genotype’s expressed pheno¬ 
type across both environments. In Table 2, we report the 
number of replicates in which the dominant genotype at the 
end of the experiment was plastic and the number of repli¬ 
cates in which the dominant genotype was optimally plas¬ 
tic. Note that for these results we only evaluated the most 
abundant genotype at the end of the experiment. An ances¬ 
tor of the evaluated genotype may have been plastic, but if 
that plasticity was not maintained in the lineage, we did not 
count it in Table 2. 

As expected, the capacity for phenotypic plasticity 
evolved in each experimental treatment; in 31 of the 50 base¬ 
line treatment replicates, phenotypic plasticity was present 
in the final dominant organism. None of the final domi¬ 
nant genotypes from the control replicates were phenotyp- 
ically plastic. In all control replicates, the dominant geno¬ 
type performed both the NAND and NOT tasks uncondi¬ 
tionally. Our results are consistent with existing theoreti¬ 
cal and empirical work supporting the validity of the condi¬ 
tions likely to facilitate the evolution of phenotypic plastic¬ 
ity (Clune et al., 2007; Ghalambor et al., 2010; Hallsson and 
Bjorklund, 2012; Nolfi et al., 1994). 

How do environmental factors impact the evolution 
of phenotypic plasticity? 

While our results show phenotypic plasticity can evolve un¬ 
der the conditions identified in (Ghalambor et al., 2010), 
how do mutation rate and fluctuation rate affect the evolution 
of phenotypic plasticity under these conditions? We found 
compelling results for both mutation rate and environmental 
cycle length. 
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Treatment 

Plastic Replicates 

Unconditional Precedes 
Conditional 

Sub-optimal Precedes 
Optimal 

Total 

Optimal* 

NAND Task 

NOT Task 

Baseline 

31 (62%) 

17 (34%) 

31 (100%) 

28 (90.3%) 

16 (94.1%) 

Low Mutation Rate 

38 (76%) 

30 (60%) 

34 (89.5%) 

35 (92.1%) 

30 (100%) 

High Mutation Rate 

25 (50%) 

11 (22%) 

25 (100%) 

24 (96%) 

10 (90.9%) 

Short Environment Cycle Length 

36 (72%) 

18 (36%) 

33 (91.7%) 

28 (77.8%) 

18 (100%) 

Long Environment Cycle Length 

16 (32%) 

10 (20%) 

14 (87.5%) 

16 (100%) 

9 (90%) 

Control 

0 (0%) 

0 (0%) 

- 

- 

- 


* Optimal is defined as the complete phenotype that only performs the rewarded task in each environment. 


Table 2: A summary of evolutionary outcomes across all five experimental treatments and control. Plastic Replicates indicates the number 
of replicates (out of 50 per treatments) in which the final dominant genotype was plastic at all (Total) and perfectly plastic (Optimal). 
Unconditional Precedes Conditional indicates the number of times the NAND task and NOT task were expressed unconditionally before 
eventually evolving to be express conditionally (out of total plastic). Finally, Sub-optimal Precedes Optimal indicates how many runs had an 
imperfect form of plasticity before eventually evolving to be optimally plastic (out of total optimally plastic). 


Mutation Rate While only of borderline statistical signif¬ 
icance (p = 0.058 using Fisher’s Exact Test with Bonfer- 
roni corrections for multiple comparisons; all statistics were 
done in R version 3.2.2 (R Core Team, 2015)), our results 
trend such that populations at lower mutation rates appear 
more likely to evolve phenotypic plasticity than do popu¬ 
lations at higher mutation rates. The most abundant geno¬ 
types exhibited some plasticity in 38/50 runs at a low mu¬ 
tation rate, 31/50 at the baseline mutation rate, and 25/50 
and the high mutation rate. While higher mutation rates 
increase genetic variation from one generation to the next, 
most mutations that have phenotypic effects are deleterious 
(Sniegowski et al., 2000). Thus, at higher mutation rates, the 
elevated influx of deleterious mutations could increase the 
difficulty of maintaining the necessary genetic machinery 
for phenotypic plasticity. Qualitative evidence for this effect 
can be seen in the time-sliced visualized lineages of final 
dominant, non-plastic genotypes from the high-mutation- 
rate treatment (Figure 3) where lineages traverse states of 
plasticity for some time before reverting back to states of 
non-plasticity 3 . Furthermore, more phenotypic shifts in 
general increase the probability of quickly finding an ap¬ 
propriate non-plastic phenotype after each environmental 
change. 

Environment Fluctuation Rate We found a highly sig¬ 
nificant difference (p = 0.00028 using Fisher’s Exact Test 
with Bonferroni corrections for multiple comparisons) as we 
varied the cycle length for environmental switching. Specifi¬ 
cally, in the long-environment-cycle-length, only 16/50 runs 
ended with a final dominant genotype that was phenotypi- 

3 For fully interactive visualizations of evolved lineages from 
all treatments, see http://cse.msu.edu/~lalejini/ 
evo-origins-of-phenotypic-plasticity-web/ 
lineage_visualization.html 


cally plastic, while the basline and short-environment-cycle- 
length produced 31 and 36 plastic outcomes, respectively. 

We expect that the short-environment-cycle-length treat¬ 
ment is biased toward the evolution of phenotypic plastic¬ 
ity because of the rapid environment fluctuations relative to 
other experimental treatments. Rapid fluctuations cause lin¬ 
eages to be less able to rely on mutational input for adapta¬ 
tion. In the long-environment-cycle-length treatment, envi¬ 
ronmental fluctuations may not be occurring rapidly enough 
to produce a sufficient selective pressure for phenotypic 
plasticity, allowing alternative adaptive strategies to evolve 
instead. 

What are the evolutionary stepping stones for 
phenotypic plasticity? 

In an attempt to identify patterns frequently encountered 
during the evolution of phenotypically plastic organisms, 
we extracted and analyzed the full lineages from our ex¬ 
periments. We tested each ancestor genotype in both ENV- 
NAND and ENV-NOT and classified their phenotype across 
both environments. In addition to a quantitative analysis, we 
also visualized the lineages of the dominant, plastic geno¬ 
types; see Figure 4 for the visualization of the baseline treat¬ 
ment. Using our visualizations and ancestor phenotype clas¬ 
sifications, we addressed the following two questions: (1) 
Do the lineages of phenotypically plastic organisms first 
evolve to perform tasks unconditionally before evolving to 
perform them conditionally as a function of their current en¬ 
vironment? And (2), do imperfect forms of phenotypic plas¬ 
ticity tend to precede optimal forms? 

Unconditional Task Performance To explore whether or 
not unconditional task performance was an evolutionary 
stepping stone for conditional task performance (i.e. phe¬ 
notypic plasticity), we determined whether a task was per- 


376 




High Mutation Rate Treatment 


Baseline Treatment 








Figure 3: Time-sliced visualization of lineages for non-plastic, 
dominant genotypes from the high-mutation-rate treatment. Quick 
color reference: cyan represents unconditional NOT task perfor¬ 
mance, dark blue represents unconditional NAND task perfor¬ 
mance, and red/purple are sub-optimal forms of plasticity. Refer 
to Figure 2 for a full legend of phenotype colors. 


Figure 4: Time-sliced lineage visualization of dominant, plas¬ 
tic genotypes from the baseline treatment. Quick color reference: 
cyan represents unconditional NOT task performance, dark blue 
represents unconditional NAND task performance, different shades 
of red/purple are sub-optimal forms of plasticity, and yellow rep¬ 
resents optimal plasticity. Refer to Figure 2 for a full legend of 
phenotype colors. 


formed unconditionally prior to being performed condition¬ 
ally by the ancestors of plastic genotypes. We analyzed both 
tasks - NAND and NOT - separately. These results are re¬ 
ported in Table 2. 

Across all experimental treatments, non-plastic ancestors 
generally preceded plastic ancestors. In other words, uncon¬ 
ditional task performance of the NAND and NOT tasks gen¬ 
erally preceded the conditional performance of either task. 
Examples of this can be seen in time-sliced plastic lineages 
from the baseline treatment (Figure 4) where many lineages 
maintain states of unconditional task expression prior to en¬ 
tering states of conditional task expression. These results 
suggest that, in fluctuating environments similar to those in 
our experiment, the evolutionary path to phenotypic plas¬ 
ticity usually traverses states of unconditional trait expres¬ 
sion prior to entering states of conditional trait expression. 
This result should be unsurprising. In order to evolve a reg¬ 
ulated function, the capacity for both the regulation and the 
function must exist. In our experiment, the function can be 
selected for without regulation; however, regulation of the 
function is unlikely to be selected for without the prior ca¬ 
pacity for the function. 

Sub-optimal Phenotypic Plasticity To investigate sub- 
optimal phenotypic plasticity as an evolutionary stepping 


stone for optimal phenotypic plasticity in our experiment, 
we analyzed lineages of optimally plastic genotypes. Again, 
we consider only complete phenotypes that exclusively per¬ 
form the rewarded task in each environment to be optimal. 
For each optimally plastic genotype’s lineage, we deter¬ 
mined whether or not the evolution of optimal plasticity was 
preceded by the evolution of sub-optimal phenotypic plas¬ 
ticity. The results of this analysis are reported in Table 2. 

Across all experimental treatments, the evolution of sub- 
optimal plasticity did, indeed, generally precede the evo¬ 
lution of optimal phenotypic plasticity. Examples of sub- 
optimal plasticity preceding more optimal forms of plastic¬ 
ity can be seen in some of the time-sliced lineages from the 
baseline treatment visualized in Figure 4. These results sug¬ 
gest that, in fluctuating environments similar to those in our 
experiment, sub-optimal forms of phenotypic plasticity tend 
to arise before the evolution of optimal forms of phenotypic 
plasticity. 

Unconditional trait expression tends to evolve first; then, 
sub-optimal forms of plasticity appear before optimal forms 
finally evolve. While challenging to verify, we expect our 
results to be applicable to biological systems. The evolution 
of complex functions ( e.g. optimal phenotypic plasticity) 
build on simpler, previously evolved functions (e.g. unreg- 
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ulated or sub-optimally regulated functions) (Lenski et al., 
2003). These results, however, are particularly useful for 
applied evolutionary computation. If an evolved problem 
solution must respond dynamically to environmental vari¬ 
ables, it is likely that the solution will need to be able to 
traverse through states of rigidity and sub-optimal plastic¬ 
ity prior to reaching a state of optimal plasticity. Thus, first 
evolving rigid solutions in fixed environments and then grad¬ 
ually starting to fluctuate more aspects of the environment 
over time could provide a scaffolding for the evolution of 
optimally plastic solutions. 

Are stochastic strategies evolving as an alternative 
to phenotypic plasticity? 

Stochastic phenotype switching - a form of bet hedging 
(Seger, 1987) - is a common strategy leveraged by bacte¬ 
ria in fluctuating environments (Rainey et al., 2011). Un¬ 
like phenotypic plasticity where environmental conditions 
alter gene expression, stochastic phenotype switching relies 
on mutational input to induce phenotypic changes. This 
strategy is thought to be a viable alternative to phenotypic 
plasticity in the absence of reliable environmental signals 
or when the processing of sensory information is costly 
(Rainey et al., 2011). Strategic stochastic phenotype switch¬ 
ing often relies on contingency loci - hypermutable regions 
of the genome that can induce phenotype switching via mu¬ 
tational input (Moxon et al., 2006). 

We hypothesized that stochastic phenotype switching was 
an alternative evolutionary strategy to phenotypic plasticity 
because of its commonality in bacteria. We most expected 
to see stochastic phenotype switching in our experimental 
treatments where the fewest number of replicates produced 
phenotypically plastic final, dominant genotypes. 

Lineage Visualization It can be difficult to intuitively un¬ 
derstand evolutionary strategies leveraged by a lineage with¬ 
out a visual aid. To explore evolutionary strategies alterna¬ 
tive to phenotypic plasticity in fluctuating environments, we 
visualized the lineages of dominant, non-plastic genotypes 
from our experimental treatments. 

If a lineage relied on stochastic phenotype switching, 
we would expect it to switch between phenotypic states of 
unconditional NAND task performance and unconditional 
NOT task performance in approximate synchronization with 
the changing environment. Specifically, we should see 
ancestors along a lineage perform NAND unconditionally 
during periods of ENV-NAND and see ancestors perform¬ 
ing NOT unconditionally during periods of ENV-NOT. We 
show a time-sliced lineage visualization of dominant, non¬ 
plastic genotypes at the end of our experiment for the long- 
environment-cycle-length treatment (Figure 5). 

From Figure 5, we see what appear to be cases of stochas¬ 
tic phenotype switching - lineages switching between phe¬ 
notypic states of unconditional NAND task performance and 


Long Environment Cycle Treatment 



Figure 5: Time-sliced lineage visualization of non-plastic, domi¬ 
nant genotypes from the long environment cycle treatment. Quick 
color reference: cyan represents unconditional NOT task perfor¬ 
mance, dark blue represents unconditional NAND task perfor¬ 
mance, and red/purple are sub-optimal forms of plasticity. Refer 
to Figure 2 for a full legend of phenotype colors. 


unconditional NOT task performance in approximate syn¬ 
chronization with the environment. Many of the lineages 
in the long-environment-cycle treatment seem to be under¬ 
going stochastic phenotype switching. A few examples of 
what appear to be stochastic phenotype switching can even 
be seen in Figure 4 (the plastic lineages from our baseline 
treatment) between updates 47,500 and 52,500 (the middle 
time-slice), prompting the following open question: in ad¬ 
dition to being an alternative strategy to plasticity in fluc¬ 
tuating environments, could stochastic phenotype switching 
also act as a precursor or building block toward plasticity? 

Our visualizations only provide an exploratory method 
for understanding evolutionary strategies employed by a lin¬ 
eage. Further analysis would be required to confirm or re¬ 
ject our hypothesis that stochastic phenotype switching is 
evolving as an alternative strategy to phenotypic plasticity 
in our system. This hypothesis is particularly worthwhile 
to explore because our mutation rate was fixed across the 
genome, preventing the evolution of contingency loci. Fur¬ 
thermore, because sensing mechanisms were perfectly ac¬ 
curate, phenotypic plasticity was a reliable strategy. We 
hypothesize that genotypes are moving to a region of the 
mutational landscape that straddles the boundary between 
expressing unconditional NAND task performance and un¬ 
conditional NOT task performance such that minimal mu¬ 
tational input is required to switch phenotypes. This type 
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of evolutionary trajectory has been demonstrated by Crom- 
bach and Hogeweg in evolutionary simulations of simple, 
genome-encoded gene regulatory network models (Crom- 
bach and Hogeweg, 2008). In their simulations, Crombach 
and Hogeweg found that networks evolved in an oscillat¬ 
ing environment possessed genotype to phenotype mappings 
that were mutationally more efficient at generating adaptive 
phenotypes in alternative environments. 

Conclusion 

In this work, we evolved populations of phenotypically plas¬ 
tic organisms at varied rates of environmental fluctuation 
and mutation using the Avida Digital Evolution Platform. 
We analyzed the lineages of evolved genotypes for clues 
about the evolutionary stepping stones toward phenotypic 
plasticity. We found that the capacity for phenotypic plastic¬ 
ity evolved under conditions identified by previous research 
(Clune et al., 2007; Ghalambor et al., 2010). We found 
evidence that traits are generally expressed unconditionally 
prior to the evolution of conditional trait expression and that 
sub-optimal forms of phenotypic plasticity generally evolve 
before optimal forms of phenotypic plasticity. Both of these 
results are examples of evolution’s use of simpler functions 
as building blocks for more complex functions as in Lenski 
et al. (Lenski et al., 2003). 

Visual inspection of the evolutionary histories leading to 
phenotypically plastic organisms suggests that under certain 
conditions stochastic phenotype switching evolves as an al¬ 
ternative strategy to phenotypic plasticity, just as it does in 
many bacteria (Moxon et al., 2006; Rainey et al., 2011). Of 
course, in these bacterial cases, hypermutable sites tend to 
appear in the genomes (called “contingency loci”) that facil¬ 
itate such task switching. 

Given these promising results, we plan to explore whether 
stochastic phenotype switching can be a viable evolution¬ 
ary strategy in the absence of the ability to evolve hyper¬ 
mutable regions of the genome. Given the potential diffi¬ 
culty in maintaining the necessary genetic machinery asso¬ 
ciated with phenotypic plasticity, are there cases in which 
stochastic phenotype switching is more robust than pheno¬ 
typic plasticity? And, does this contribute to the evolution 
of stochastic phenotype switching as an evolutionary strat¬ 
egy? Metrics are clearly needed for quantifying stochastic 
phenotype switching in digital systems and for evaluating 
the mutational landscapes of genotypes along a lineage. 
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Abstract 

Hebbian plasticity in artificial neural networks is compelling 
for both its simplicity and biological plausibility. Changing the 
weight of a connection based only on the activations of the neu¬ 
rons it connects is straightforward and effective in combination 
with neuromodulation for reinforcing good behaviors. How¬ 
ever, a major obstacle to any ambitious application of Hebbian 
plasticity is that the performance of a layer of Hebbian neu¬ 
rons is highly sensitive to the choice of inputs. If the inputs 
do not represent precisely the features of the environment that 
Hebbian connections must learn to correlate to actions, the 
network will struggle to learn at all. A recently-proposed solu¬ 
tion to this problem is the Real-time Autoencoder-Augmented 
Hebbian Network (RAAHN), which inserts an autoencoder 
between the inputs and the Hebbian layer. This autoencoder 
then learns in real time to encode the raw inputs into higher- 
level features while the Hebbian connections in turn learn to 
correlate these higher-level features to correct actions. Until 
now, RAAHN has only been demonstrated to work when it is 
driven by an autopilot during training (in a robot navigation 
task), which means its experiences are carefully controlled. 
Progressing significantly beyond this early demonstration, the 
present investigation now shows how RAAHN can learn to 
navigate from scratch entirely on its own, without an autopilot. 
By removing the need for an autopilot, RAAHN becomes a 
powerful new Hebbian-centered approach to learning from 
sparse reinforcement with broad potential applications. 

Introduction 

As a key mechanism behind adaptation in natural organisms, 
neural plasticity has attracted significant interest in artificial 
life (alife) (Floreano and Urzelai, 2000; Niv et al., 2002; 
Soltoggio et al., 2008, 2007; Soltoggio and Jones, 2009; 
Soltoggio and Stanley, 2012; Risi et al., 2011; Risi and Stan¬ 
ley, 2012; Stanley et al., 2003; Coleman and Blair, 2012). 
A popular option for studying neural plasticity in artificial 
neural networks (ANNs) is Hebbian learning, which follows 
the simple mechanism of increasing connection weights pro¬ 
portionally to the activation strengths of the neurons they 
connect (Hebb, 1949). For example, researchers often in¬ 
corporate Hebbian learning into evolutionary algorithms that 
evolve ANNs to control agents in dynamic or uncertain envi¬ 
ronments (Floreano and Urzelai, 2000; Soltoggio et al., 2008; 


Risi et al., 2011). Sometimes such Hebbian networks are ac¬ 
companied by neuromodulation (Soltoggio et al., 2008, 2007; 
Soltoggio and Jones, 2009; Soltoggio and Stanley, 2012; Risi 
and Stanley, 2012; Coleman and Blair, 2012), which allows a 
reward or penalty signal to turn on or off the plasticity of Heb¬ 
bian connections appropriately. However, a major obstacle to 
the success of Hebbian ANNs is that the inputs to Hebbian 
layers must be carefully selected to encompass the right in¬ 
coming environmental features or the proper correlations will 
otherwise become too difficult to learn. In domains in which 
the right features may not be known a priori, or where it may 
even be necessary to learn them from raw inputs through 
experience, Hebbian learning thereby becomes brittle or even 
prohibitive. 

Responding to this challenge, Pugh et al. (2014) proposed 
recently that it might be possible to bridge the gap between 
the raw inputs to an ANN and a Hebbian layer through an 
autoencoder , which is itself an ANN with at least one hidden 
layer that is trained to reconstruct its inputs (Bengio, 2009). 
The value of the hidden layer in the autoencoder is that it 
typically comes to represent higher-level features of the envi¬ 
ronment (because they then aid in the reconstruction of the 
inputs). These higher-level features distilled from raw inputs 
could be just the features needed by a Hebbian layer to learn 
correlations between environmental features and appropriate 
agent actions. The idea behind Pugh et al. (2014) is that in 
principle both an autoencoder layer and a neuromodulated 
Hebbian layer can be learned simultaneously, in real time, 
which would allow an agent to construct a higher-level repre¬ 
sentation of its environment at the same time as it learns to 
navigate based on that developing representation. Pugh et al. 
(2014) call this hybrid combination of autoencoder and Heb¬ 
bian layers a Real-time Autoencoder-Augmented Hebbian 
Network (RAAHN). 

To validate RAAHN, Pugh et al. (2014) showed that it can 
learn key features of a two-dimensional maze domain at the 
same time as learning to navigate the domain in real time. 
Furthermore, a pure Hebbian learner could not effectively 
learn the same policy, thereby confirming the advantage of 
Hebbian learning from the autoencoder layer. However, a 
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major limitation of this demonstration is that the agent was 
guided during learning by an autopilot that ensured that the 
agent experienced a prescripted succession of inputs identi¬ 
fied by the experimenters as appropriate to the task. In this 
way, the autopilot phase resembles supervised learning more 
than the kind of autonomous exploratory learning one might 
hope to see in alife. Ideally, the agent would explore on 
its own, accumulating higher-level features as it goes, and 
improving its ability to navigate based on those features at 
the same time. 

The aim of this paper is to take that next step, demonstrat¬ 
ing that RAAHN is indeed sufficiently capable of doing all 
the learning on its own, without an autopilot to guide it. Such 
a result would open up a broad range of possible experiments 
and applications, where agents can be released into a world 
to explore and learn without explicit guidance, more in the 
spirit of reinforcement learning (RL) (Watkins and Dayan, 
1992; Rummery and Niranjan, 1994) than supervised learn¬ 
ing. To enable this capability, RAAHN is slightly elaborated 
through a new kind of novelty-based history buffer (which 
decides from which experiences it is trained) and neural noise 
to encourage autonomous exploration. The result, demon¬ 
strated in a two-dimensional maze domain, is ultimately that 
RAAHN can learn effectively on its own, and furthermore 
that RAAHN can learn even when the provided sensors are 
insufficient for neuromodulated Hebbian learning. 

With RAAHN’s ability to learn control policies and higher- 
level features in real time established, RAAHN can be¬ 
gin to be employed in more sophisticated unguided scenar¬ 
ios. RAAHN also goes beyond traditional RL algorithms 
(Watkins and Dayan, 1992; Rummery and Niranjan, 1994) 
because it has the potential to learn increasingly high-level 
features through stacking autoencoders (Le et al., 2012) in 
the future. Such autoencoder stacks and entire RAAHN archi¬ 
tectures potentially can even be evolved in the future through 
neuroevolution (Stanley and Miikkulainen, 2002; Floreano 
et al., 2008; Yao, 1999). As a first step towards such ends, this 
study accordingly establishes best practices for successfully 
running RAAHN without the need for an autopilot. 

Background 

Before previewing the original work on RAAHN, this section 
begins with a review of Hebbian learning and autoencoders, 
which are the two core components of RAAHN. 

Hebbian Learning 

Basic Hebbian learning is implemented in ANNs with the 
simple learning rule 

A Wi = rjXiy, (1) 

where Wi is the weight connecting two neurons with activa¬ 
tions Xi and y , and y is the learning rate. This learning rule 


has the advantage of being completely local, making its ap¬ 
plication flexible. It is also biologically motivated, reflecting 
basic principles of neural plasticity. 

Researchers interested in evolving ANNs in particular took 
interest in the Hebbian rule as a means to allowing evolved 
ANNs to exhibit plasticity during their lifetime (Floreano and 
Urzelai, 2000; Niv et al., 2002; Risi and Stanley, 2010; Risi 
et al., 2011; Stanley et al., 2003). Furthermore, by adding 
neuromodulation to the basic Hebbian learning rule, Hebbian 
ANNs are able to be trained with rewards and penalties simi¬ 
lar to reinforcement learning algorithms (Watkins and Dayan, 
1992). Neuromodulation allows an experience to strengthen, 
weaken, or have no effect on learned Hebbian correlations 
by associating a modulatory signal with the experience. The 
modulatory signal can be calibrated to guide learning based 
on an agent’s behavior within its environment. Interestingly, 
neuromodulation has been shown to elicit agent behavior 
reminiscent of operant conditioning in animals (Soltoggio 
and Stanley, 2012; Soltoggio et al., 2013). Researchers in 
neuroevolution and artificial life have also shown that evolu¬ 
tionary algorithms benefit from Hebbian learning combined 
with neuromodulation by allowing their discovered ANNs 
to learn from reward signals (Soltoggio et al., 2008, 2007; 
Soltoggio and Jones, 2009; Soltoggio and Stanley, 2012; Risi 
and Stanley, 2012; Coleman and Blair, 2012). 

A major obstacle to building a general learning system 
around the Hebbian rule is that its success depends greatly 
upon receiving inputs that correspond to precisely the domain 
features necessary to learn the right correlations for the task 
(Field, 1994). RAAHN introduced the idea of placing an 
autoencoder, reviewed next, before the Hebbian layer so that 
such essential features can be learned from raw inputs without 
the need for human engineering. 

Autoencoders 

An autoencoder is an ANN that is trained to learn a feature 
representation of its inputs that is conceptually at a higher 
level. For example, edge detectors are a higher-level feature 
of images than raw pixels (Hinton and Salakhutdinov, 2006). 
The autoencoder achieves such representation by encoding its 
inputs in a hidden layer (the learned feature representation) 
that is then decoded by an output layer (of the same dimen¬ 
sionality as the input layer) representing the autoencoder’s 
reconstruction of its inputs. The autoencoder is trained to 
minimize the error between its reconstruction and its inputs 
(Bengio et al., 2013). Rather than a different set of weights 
representing the encoder and the decoder, the same set of 
weights can compute both the encoding and reconstruction, 
which is called tied weights (Vincent, 2011). 

Deep learning researchers attracted fresh interest in au¬ 
toencoders by showing that they can be stacked into layers 
that learn increasingly high-level features (Le et al., 2012). 
That way, for example, raw inputs can be encoded into edge 
detectors, which can be encoded into increasingly high-level 


383 



concepts until face detectors arise. There are many ways to 
train autoencoders, and many heuristics to help them learn 
meaningful feature representations (Ranzato et al., 2006; Le 
et al., 2012), but for RAAHN the precise autoencoder im¬ 
plementation is not the key concern because in theory any 
autoencoder can be plugged into RAAHN, so as autoencoders 
improve, RAAHN also improves. 

RAAHN 

RAAHN was recently introduced by Pugh et al. (2014), who 
were motivated by the limitations of Hebbian learning to com¬ 
bine an autoencoder with the Hebbian layer so that Hebbian 
correlations could be learned from the features extracted by 
the autoencoder. The idea is that in theory both the features 
and the neuromodulated Hebbian correlations can be learned 
in real time, as an agent navigates its environment, thereby 
offering an appealing new approach to reinforcement-like 
learning. However, this original work relied on an initial 
autopilot phase for RAAHN to learn a feature set represen¬ 
tative of the domain. The autopilot in effect ensures that the 
feature set learned by the autoencoder is reliable and consis¬ 
tent because the learner is forced to encounter a prescripted 
chronology of experiences. While this setup helps to demon¬ 
strate that RAAHN can learn in principle, it is in effect a 
form of supervised learning, which leaves open the question 
of whether RAAHN can really learn on its own in real time. 

Approach 

The key hypothesis driving this paper is that RAAHN can 
still perform well in real time even without an initial autopilot 
phase if the autoencoder component learns a good feature set. 
Most of the original RAAHN setup from Pugh et al. (2014) 
does not need to change, but there are several proposed imple¬ 
mentation differences to support fully autonomous learning. 
The experiments in this paper will apply RAAHN to a similar 
two-dimensional agent navigation task to that in Pugh et al. 
(2014), where the agent learns on every simulation tick. 

Autoencoder Component 

The autoencoder implementation follows conventional prac¬ 
tice (Hinton and Salakhutdinov, 2006; Bengio, 2009). In 
particular, it computes neural activations with tied weights 
trained with stochastic gradient descent and error backpropa- 
gation. Recall that the aim of an autoencoder is to reproduce 
its own inputs. That way, its hidden layer becomes an encod¬ 
ing of the input space that captures its essential underlying 
features. The forward activation Aj for a hidden neuron j 
with input neurons I is calculated with 


and Wij is the weight between input neuron i and hidden 
neuron j. After calculating the forward activations, the back¬ 
ward activation Bi (whereby the reconstruction is computed) 
for every input neuron is calculated with 


B, = cr 



ij) 


(3) 


where a is still the logistic function, and Aj is the previously 
computed activation for a hidden neuron j G H. 

The backward activations thus serve as reconstructions of 
the original inputs. With these reconstructions, the recon¬ 
struction error E; t can be calculated for each input i: 


E{ — A{ — B{. 


(4) 


From these error values the delta S L is calculated for each 
input neuron i, which will help to compute backpropagated 
error: 

6 i = E i -<r'(B i ), (5) 

where cr'i is the derivative of the logistic function at the 
reconstruction Bi for the input neuron i. Now the delta for a 
given hidden neuron j can be calculated as 

5j = • w hj) j a '( A j)’ (6) 

where Si is from equation 5, Wij is from equation 2, and 
cr'(Aj) is the derivative of the logistic function at the forward 
pass Aj for hidden neuron j. 

With the original error deltas Si and backpropagated error 
deltas Sj , the tied weights of the autoencoder component can 
be updated. The change in weight A Wij for the connection 
from input neuron i to hidden neuron j is then 

A Wij = ol (SiAj + SjAi ), (7) 

where a is the learning rate (held constant in later experi¬ 
ments at 0.1), and the remaining variables as described above. 

In RAAHN this autoencoder is trained in real time as the 
agent explores its world, raising the question of on what data 
it should be trained at any given tick. To address this ques¬ 
tion a history buffer saves n experiences (sets of rangefinder 
values) that the agent encounters in the domain. However, 
the method through which these n experiences are chosen 
turns out instrumental in facilitating the novel real-time ex¬ 
ploratory autonomy of RAAHN investigated in this paper, as 
explained next. 



where cr is the activation function (in this paper the logistic 
function), Ai is the activation of a given input neuron i G /, 


History Buffer Management 

A simple way to manage the history buffer is to save ev¬ 
ery experience as it is discovered and drop the oldest ones 
when the buffer is at capacity. This method is called Queue- 
RAAHN because the history buffer is in effect managed as 
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a queue. It gives RAAHN the ability to learn a feature set 
from past experiences in the domain, but it also has the sig¬ 
nificant downside that the history buffer may accumulate too 
much of a single kind of experience (e.g. crash experiences), 
eventually causing it to “forget” other important aspects of 
the domain. While this approach worked in research by 
Pugh et al. (2014) under the controlled environment of the 
autopilot, when RAAHN is allowed to run autonomously the 
chance of a succession of negative experiences like crashes 
is much higher. 

To address this problem the history buffer can instead 
be managed by saving only the most novel experiences en¬ 
countered during the agent’s lifetime. That way, commonly 
repeated experiences (such as crashing into a wall), will not 
come to occupy the entire buffer. This new method is called 
Novelty-RAAHN. To determine the novelty of an experience 
the Euclidean distance is calculated between it and all the 
other experiences in the buffer. Inspired by the calculation of 
novelty in the novelty search algorithm (Lehman and Stanley, 
2011), the experience is then assigned a novelty score , which 
is the sum of the 20 smallest such distances. The current 
experience is then added to the buffer only if its novelty score 
is greater than that of the least novel experience in the buffer, 
which it replaces. That way, the buffer fills with diverse 
rather than redundant experiences that give the autoencoder 
a representative sample of the entire domain. 

Hebbian Component 

The Hebbian component of RAAHN, which learns a con¬ 
troller based on the features from the autoencoder, trains with 
the reconfigure and saturate method described by Soltoggio 
and Stanley (2012). This method adds a modulatory signal 
(which can either be positive or negative) to the basic Heb¬ 
bian learning rule, thus giving reward and penalty feedback 
to the otherwise-naive Hebbian component. The modulatory 
signal m influences the weights through the update equation 

A Wi = mrjXiy, (8) 

where r] is the learning rate, and Xi and y are the activations of 
the two neurons connected by the weight w t . As prescribed 
by reconfigure and saturate, noise is added to each output 
and weight delta to facilitate exploration (which is essential 
when there is no autopilot). Thus the noisy activation Aj for 
each output neuron is computed as 



where ^ is the neural noise associated with Aj. Noise is 
added to each weight delta with 

A Wi = mr)Xiy + ii , (10) 

where & is noise sampled from the same distribution as in 
equation 9 (in this paper: a uniform distribution in the range 
[- 0 . 1 , 0 .!]). 


Hebbian and RAAHN Architectures 

The experiments in this paper compare pure Hebbian learning 
and RAAHN in a two-dimensional agent navigation domain. 
The raw inputs for both pure Hebbian and RAAHN come 
from 11 simulated wall-sensing rangefinders. Both network 
architectures output a single value denoting the fraction of 
the maximal turn angle to steer the agent. The pure Heb¬ 
bian architecture is simple in that it merely connects the raw 
inputs directly to the output with a single layer of Hebbian- 
trained connections (figure la). The RAAHN architecture 
includes two layers of weights and an intervening five-neuron 
hidden layer (figure lb), where the first layer of weights is 
trained by the autoencoder learning rule and the second layer 
is Hebbian-trained. Therefore, RAAHN’s Hebbian compo¬ 
nent essentially learns driving behavior from a set of five 
higher-level features extracted from the raw inputs by the 
autoencoder component. The number five, which preliminary 
experiments suggest is not sensitive to minor variation, is 
chosen to avoid learning only the identity function. 

Experiments and Results 

Recall that the hope in this paper is to advance beyond the pre¬ 
vious finding that RAAHN can navigate a two-dimensional 
maze domain with an initial autopilot-driven training phase 
(Pugh et al., 2014). Although that study established that 
RAAHN can learn features as it learns to control, the more 
exciting potential of RAAHN is the ability to learn by itself, 
without an autopilot, in the spirit of real organisms. 

By promising to learn new features at the same time as it 
learns a control policy even without any preliminary train¬ 
ing, RAAHN adds a novel capability over and above what 
modulated Hebbian plasticity can offer. However, that new 
capability also raises the possibility that RAAHN might be 
overall more difficult to train without the help of the autopilot. 
For that reason, the experiments that follow establish first 
that RAAHN remains competitive with Hebbian on problems 
that Hebbian can solve. Once that is established, showing 
in effect that the new capabilities of RAAHN cost very lit¬ 
tle, the next logical step is an experiment that shows that on 
some problems (where feature learning is essential), RAAHN 
becomes critical to making effective learning possible. 

Queue-RAAHN Experiment 

To explore the potential of RAAHN to learn through its own 
exploration, a two-dimensional maze experiment similar to 
the one carried out by Pugh et al. (2014) is conducted without 
the initial autopilot. Instead, the agent is allowed to learn 
from its own decisions as it explores the world, as described 
in the Approach section. 

Queue-RAAHN manages its history buffer as a queue. Ev¬ 
ery experience is saved in real time and the oldest experiences 
are deleted when the buffer is at capacity. Recall that the ex¬ 
periences in the buffer will periodically train the autoencoder 
in real time. This simple queue-based approach to storing 
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Steering Output 



(a) Hebbian Architecture 


(b) RAAHN Architecture 


Figure 1: Hebbian and RAAHN Architectures. The Hebbian architecture (a) simply connects the 11 rangefinder inputs to the 
one output corresponding to the agent’s steering. The RAAHN architecture (b) introduces an autoencoder layer between the 
rangefinder inputs and the steering output. The first layer of connections is trained as the autoencoder with tied weights, to learn 
(in the experiment) five high-level features from the rangefinder inputs. These features are then connected in RAAHN to the 
steering output with Hebbian-trained connections. 


experiences gives the agent memory of previous experiences 
so it does not immediately forget them. 

To test this approach (and others to be introduced shortly), 
the agent is given 11 rangefinders (depicted in figure 2) to 
detect how close it is to nearby walls. The rangefinders are 
separated from each other by an angle of 18 degrees. The two 
rangefinders at the edges of the range are 90 degrees from 
the middle rangefinder. Each rangefinder in this first experi¬ 
ment is 350 units long; the whole track is 3,140 units from 
left to right and 2,160 units from top to bottom. Rangefind¬ 
ers produce a minimum activation of 0.0 when they do not 
intersect walls, and a maximum activation of 1.0 when the en¬ 
tire rangefinder intersects a wall. Intermediate intersections 
produce an activation between 0.0 and 1.0. 

Queue-RAAHN is compared to a single-layer neural net¬ 
work with Hebbian connections. The aim is to show that 
the autoencoder layers in RAAHN, which are lacking in the 
Hebbian network, do not diminish the ability of RAAHN to 
learn on its own in real time compared to the Hebbian layer 
alone. The agents controlled by Queue-RAAHN and the Heb¬ 
bian neural network are referred to as Queue-RAAHN and 
Hebbian, respectively. Both neural network topologies take 
the 11 inputs and produce one output denoting the turning 
angle, with a range of [-2.0, 2.0]. A turn angle output of 2.0 
changes the agent’s direction by 2.0 degrees for the given tick. 
The neural network topology of Queue-RAAHN includes an 
autoencoder with a hidden layer of five neurons to learn fea¬ 
tures from the 11 rangefinder inputs, as depicted in figure lb. 
The history buffer size for Queue-RAAHN is 500. For both 
methods Hebbian training occurs once every tick based on 
only the most recent experience. In Queue-RAAHN the au¬ 
toencoder component also trains on 20 randomly-selected ex¬ 
periences from the history buffer (which for Queue-RAAHN 
is of course managed as a queue). The learning rates for 
autoencoder and Hebbian training (for the Hebbian layer of 
RAAHN and the pure Hebbian network) are held constant 


at 0.1 and 1.0 respectively. Both the single-layer Hebbian 
network and the Hebbian component of Queue-RAAHN re¬ 
ceive positive modulation for turning away from walls, and 
negative modulation for turning towards walls, in the range 
of [-1.0, 1.0], where the magnitude of modulation is propor¬ 
tional to the magnitude of the turn. The wall chosen for this 
calculation is the closest wall colliding with an imaginary 
line projecting from the center of the agent 400 units in the 
direction the agent is facing. If the imaginary line does not 
intersect any wall then modulation is zero. 

The simulation is run 200 times for both methods, each 
time for 10,000 ticks (i.e. simulation state updates). Agents 
that fail to complete at least one lap are complete failures. 
Agents that complete more than one lap but fewer than four 



Figure 2: X-Shaped Domain. The track is not uniform to 
ensure the agent does not simply just repeat one behavior 
several times. The red dot in the center determines agent 
performance; every time the agent completes a circle around 
the dot, it completes one lap. 
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laps are partial failures because their performance is less 
than half that of correctly-driving agents. 

Hebbian on average completes 9.2 laps and exhibits no 
complete failures nor partial failures. However, Queue- 
RAAHN on average completes 7.8 laps (which is signifi¬ 
cantly less at p < 0.01; Student’s t-test) and yields 14% 
complete failures and a further 1% partial failures. The fail¬ 
ure of Queue-RAAHN to approximate the performance of the 
plain Hebbian network suggests that the queue-based learn¬ 
ing approach does not enable RAAHN to approach optimal 
performance. 

Observing runs visually in real time reveals that after col¬ 
liding with a wall, Queue-RAAHN agents either escape after 
a few hundred ticks or else become stuck for the duration of 
the run. Interestingly, an analysis of Queue-RAAHN neural 
networks suggests that this behavior results from poor driving 
compounded with poor representation. That is, sometimes 
agents drive poorly initially, which leads them to fill their his¬ 
tory buffer with only crash experiences. This misadventure 
then leads to learning a poor feature set also representative 
only of crash experiences. The poor feature set then makes 
it difficult for such agents to escape their perpetual crashing 
and learn the types of better driving behaviors employed by 
agents who complete more laps. 

Thus one hypothesis is that the main problem with Queue- 
RAAHN is that the autoencoder is unable to learn a feature set 
that is representative of the entire domain. If this hypothesis 
is true then if the feature set is constrained to be novel and 
therefore representative of the entire domain, RAAHN should 
perform about as well as Hebbian, which is tested next. 

Novelty-RAAHN Experiment 

By constraining the history buffer of RAAHN to contain only 
the most novel experiences, the data set on which RAAHN 
trains can become representative of the entire domain. For 
example, when the agent crashes into a wall, because crash 
experiences tend to be similar, they are not able to flood the 
buffer the way they do in Queue-RAAHN. By thereby avoid¬ 
ing a buffer with only one or few kinds of experience, the 
history buffer accumulates experiences of the entire domain 
as the agent explores it. This experiment uses the same pa¬ 
rameters as the previous experiment aside from the difference 
in history buffer management. The novelty constraint on the 
history buffer still allows the simulation to run in real-time, 
as can be observed in the source available at: 

http://eplex.cs.ucf.edu/uncategorised/software 

The simulation is run 200 times for 10,000 ticks each. On 
average Novelty-RAAHN completes 8.8 laps, which is sig¬ 
nificantly above the 7.8 laps achieved by Queue-RAAHN 
(p < 0.05; Student’s t-test). While the 8.8 laps of Novelty- 
RAAHN is still significantly below the 9.2 of Hebbian alone 
(p < 0.01; Student’s t-test), this difference is small (less than 
one lap), and moreover some small disparity is essential in 


practice because Novelty-RAAHN must consume some extra 
time at the beginning of each run acquiring a novel set of 
experiences. Novelty-RAAHN also suffers only 0.5% com¬ 
plete failures and no partial failures. Thus it is likely close 
to performing as well as possible for a method that learns 
both features and policy at the same time. In conclusion, in 
this task in which the Hebbian network is provided a good 
input representation, RAAHN can learn in real time to ap¬ 
proximate the same performance all while learning its feature 
representation in real time as well. 

Increased Rangefinder Length Experiment 

While it is important to establish that RAAHN can learn a 
representation in real time competitive with Hebbian alone, 
RAAHN’s real promise is to learn better feature representa¬ 
tions that overcome the limitations of the raw sensors. One 
way to investigate this idea is to degrade the quality of the 
rangefinders by increasing their length. Such an increase 
makes distinguishing different situations more difficult by 
forcing more sensory input into the agent as the sensors in¬ 
tersect walls more frequently and with greater activation. In 
theory RAAHN can overcome this challenge to some extent 
because it learns a new representation from the sensory input. 
However, Hebbian is forced to learn from degraded inputs. 

To investigate whether RAAHN can indeed gain an advan¬ 
tage by learning a new feature representation, this experiment 
tests Hebbian and Novelty-RAAHN over ten variations of 
rangefinder lengths. These variants range from 10% longer to 
100% longer with an interval of 10% between each variant. 

The simulation is run for each variant 100 times, each for 
10,000 ticks. As the rangefinder length increases both Heb¬ 
bian and Novelty-RAAHN experience significantly more fail¬ 
ures. However, Novelty-RAAHN indeed exhibits far fewer 
failures (figure 3). Hebbian begins to experience dozens of 
failures as early as 30%, 40%, and 50% longer while Novelty- 
RAAHN experiences fewer than four complete failures and 
fewer than seven partial failures at the same lengths. In addi¬ 
tion, the number of laps completed by Novelty-RAAHN is 
significantly greater from 30% onward (p < 0.05). Novelty- 
RAAHN is only affected eventually by dramatically increas¬ 
ing the rangefinder lengths to values that provide little dis¬ 
cernible information. Thus Novelty-RAAHN is significantly 
less sensitive to the precise sensory setup than Hebbian. 

Discussion and Future Work 

The experimental results establish for the first time that 
RAAHN is able to learn effective maze navigation behavior 
without the need for an autopilot. This achievement opens 
up a wide range of application domains because it means 
RAAHN does not need knowledge of the problem domain 
a priori. Interestingly, the implication is also that RAAHN 
can now be applied to conventional RL problems where train¬ 
ing data is not labeled because now RAAHN only needs a 
modulation scheme to learn Hebbian correlations. Of course, 
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Percent Longer 
(b) Complete + Partial Failures 


Figure 3: Complete and Partial Failures (lower is better). The number of complete failures (a) and complete plus partial 
failures (b) is shown for Hebbian and Novelty-RAAHN for rangefinder length increments between 0% and 100%. In both cases, 
the fact that Novelty-RAAHN completes significantly more laps in all cases from 30% longer onward (p < 0.01 for all 30% and 
above) is reflected in the higher number of runs that fail for Hebbian. 


because RAAHN is significantly different from conventional 
RL algorithms such as Q-learning (Watkins and Dayan, 1992) 
or SARSA (Rummery and Niranjan, 1994) its strengths and 
weaknesses are likely different as well, but the opportunity 
for an entirely new path of research in this direction offers 
the potential for new discoveries and insights that would not 
emerge from conventional RL. 

For example, already in this paper we begin to glimpse 
some principles behind learning a useful feature set in real 
time. In particular, it is likely that Queue-RAAHN performs 
significantly worse than Novelty-RAAHN because it has a 
tendency to “forget” salient aspects of the domain if they are 
not constantly revisited, as is the case when an agent crashes 
into a wall for an extended period of time: eventually its 
buffer becomes filled entirely with crash experiences that 
offer little utility beyond the scenario of a crash. Novelty- 
RAAHN avoids this trouble by only forgetting experiences 
that are redundant, instead attempting to maintain a set of 
experiences representative of the entire domain. The success 
of the novelty-driven buffer in Novelty-RAAHN thus hints 
at the importance of gathering and retaining experience in a 
principled manner. 

Advances in autoencoders, which provide RAAHN the 
ability to represent higher-level concepts related to its do¬ 
main, also lead to possible enhancements to RAAHN as well. 
In this way, RAAHN’s potential reaches beyond simply cal¬ 
ibrating sensitivity to a range of input parameters. Rather, 
as the domains in which RAAHN is applied become more 
complex, so does the possibility for more interesting feature 
sets. RAAHN can potentially stack several autoencoders 
(Hinton and Salakhutdinov, 2006) to learn high-level features 
from the complex data of vision, audition, or any other sen¬ 
sory modalities. The correlations learned by the Hebbian 
component can then serve to respond to those features in 
real time. It may also be possible to adapt RAAHN archi¬ 


tectures through neuroevolution (Stanley and Miikkulainen, 
2002; Floreano et al., 2008; Yao, 1999). If the topology of a 
RAAHN architecture can change over evolution, the process 
of finding an effective architecture (including autoencoder 
stacks) could be automated. 

Furthermore, by pairing Hebbian learning with an autoen¬ 
coder, much more becomes possible through Hebbian modu¬ 
lation than would be possible in a simple Hebbian network 
alone (without an autoencoder), thereby breathing new life 
into research focused on Hebbian learning. Supporting this 
view, when rangefinder lengths are extended beyond what is 
optimal, RAAHN exhibits significantly fewer failures than 
Hebbian alone. The performance of Hebbian alone degrades 
quickly without the autoencoder because Hebbian learns cor¬ 
relations best with sparse input activations (Olshausen and 
Field, 2004). With very long rangefinders, most of the in¬ 
puts are highly active at any given time, so Hebbian cannot 
learn meaningful correlations. RAAHN’s performance does 
not degrade as quickly because its autoencoder transforms 
the highly active inputs into more distributed features better- 
suited for the Hebbian component to learn effective driving 
behavior. Sparse autoencoders (Le et al., 2012) might help 
to limit such degradation even further. 

As an example of how recent work in Hebbian learning can 
enhance RAAHN, new ideas on augmenting Hebbian connec¬ 
tions to react to distal rewards (Soltoggio, 2015) (i.e. rewards 
from far away in time) can potentially shift RAAHN from its 
current limited temporal context to learning long-term causal 
dependencies. Recurrent connections might further allow 
learning to react to experiences from the past. These pos¬ 
sibilities in effect draw on advances in Hebbian learning in 
general, and provide fuel for further research into improving 
Hebbian learning. 

Finally, as a novel approach to RL, RAAHN aligns natu¬ 
rally with research in alife because agent behavior is shaped 
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in RAAHN through low-level neuromodulation as opposed 
to high-level value-function approximation (as in Q-learning 
and SARSA), making its analogy to low-level biological 
processes (in particular Hebbian plasticity) more accessible 
and open to study. Future work will focus on more com¬ 
plex domains that require for example behaving in location- 
dependent contexts by learning in real time the identifying 
features of different locations. 

Conclusion 

Moving beyond the original demonstration of RAAHN (Pugh 
et al., 2014) that depended on an initial autopilot phase, this 
paper showed how RAAHN can explore and learn on its own 
without an autopilot phase. By maintaining a buffer of expe¬ 
riences representative of the domain, such real-time learning 
becomes realistic. This new capability was demonstrated in a 
robot control domain where RAAHN learned to steer a robot 
through hallways on its own from scratch. The benefit of the 
architecture was further demonstrated by showing how much 
less its performance degrades compared to a simple Hebbian 
network when the sensory inputs become less optimal. The 
long-term implication is that RAAHN is a new sandbox and a 
new model for experimenting with modulation and reinforce¬ 
ment learning, which in the future can benefit from advances 
in both Hebbian learning and autoencoders alike. 
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Abstract 

We present an implementation of a biologically inspired 
model for learning multimodal body representations in artifi¬ 
cial agents in the context of learning and predicting robot ego- 
noise. We demonstrate the predictive capabilities of the pro¬ 
posed model in two experiments: a simple ego-noise classifi¬ 
cation task, where we also show the capabilities of the model 
to produce predictions in absence of input modalities; an ego- 
noise suppression experiment, where we show the effects in 
the ego-noise suppression performance of coherent and inco¬ 
herent proprioceptive and motor information passed as inputs 
to the predictive process implemented by a forward model. In 
line with what has been proposed by several behavioural and 
neuroscience studies, our experiments show that ego-noise at¬ 
tenuation is more pronounced when the robot is the owner of 
the action. When this is not the case, sensory attenuation 
is worse, as the incongruence of the proprioceptive and mo¬ 
tor information with the perceived ego-noise generates bigger 
prediction errors, which may constitute an element of sur¬ 
prise for the agent and allow it to distinguish between self¬ 
generated actions and those generated by other individuals. 
We argue that these phenomena can represent cues for a sense 
of agency in artificial agents. 

Introduction 

Empirical evidence from cognitive science and neuroscience 
suggests that we, as humans, maintain an internal represen¬ 
tation of our body, or a model of our motor system, and that 
such an internal model would be involved in processes of 
simulation of sensorimotor activity. These processes would 
affect the way we experience the interaction with the en¬ 
vironment and would be fundamental for the implemen¬ 
tation of basic cognitive skills. For example, simulation 
processes are thought to have a role in the way we differ¬ 
ently perceive self-generated actions or actions performed 
by other subjects. One of the proposals that explains this 
phenomenon (Blakemore et al., 2000a,b) says that when we 
perform a motor action, an efferent copy of the motor com¬ 
mands that our brain sends to our muscles would be used in 
a predictive process that anticipates the sensory outcomes 
of the movement. Such predictions would then be com¬ 
pared to the actual sensory consequences and, if the two 


correspond, the perceived sensory consequences are atten¬ 
uated. This would enable a differentiation between self¬ 
generated sensory events and those externally generated that 
are not mapped to any internally generated efferent copy 
of the motor commands (Blakemore et al., 2000a). The 
existence of such a self-monitoring mechanism would ex¬ 
plain, for example, why tickling sensations cannot be self- 
produced (Blakemore et al., 2000b), why people are better at 
recognising themselves than others when watching movies 
of only point-light walkers (Loula et al., 2005), why people 
are more accurate in predicting the landing point of a thrown 
dart from a video screen when they observe their own throw¬ 
ing action than when observing another person’s throwing 
action (Knoblich and Flach, 2001), or why people perceive 
the loudness of sounds as less intensive when they are self¬ 
generated, than when they are generated by other persons 
or by a software (Weiss et al., 2011). In this latter study on 
selective attenuation of self-generated sounds, the authors 
proposed that the experience of perceiving actions as self¬ 
generated would be caused by the anticipation and, thus, the 
attenuation of the sensory consequences of such motor com¬ 
mands, which would be related to ’’the privileged access to 
internally generated efferent information during one’s own 
action” (Weiss et al., 2011). The sense of agency, that is the 
pre-reflective experience that we are the owner of an action 
we are executing, is thus proposed to be dependent on the 
degree of congruence vs. incongruence between predicted 
and actual sensory consequences of our bodily actions. 

In the investigation on sensorimotor simulation processes 
in the human brain, internal forward and inverse models 
have been proposed (Wolpert et al., 2001). A forward model 
(illustrated in Figure 1) - or predictor, as firstly proposed in 
the control literature as a means to overcome problems such 
as the delay of feedback in control strategies - incorporates 
knowledge about sensory outcomes of self-generated ac¬ 
tions. Inverse models - or controllers, as they were initially 
proposed for implementing inverse kinematics processes 
for controlling robotic manipulators - perform the opposite 
transformation providing a system with the necessary motor 
command to go from an initial sensory situation to a desired 
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Figure 1: An illustration of the forward model (predictor). 


one. Such models encode the dynamics of the motor system 
and can provide artificial agents with multimodal represen¬ 
tations, as they fuse together sensory and motor informa¬ 
tion (Wilson and Knoblich, 2005), and with the capability 
to predict sensorimotor activities based on previous expe¬ 
rience. Studies such as the ones reported above shed light 
on the importance that predicting sensory consequences of 
self-generated actions have for basic motor tasks and cog¬ 
nitive skills. Equipping artificial agents with similar com¬ 
putational processes has been shown to be a promising ap¬ 
proach in the development of different skills, such as naviga¬ 
tion (Moller and Schenck, 2008; Escobar et al., 2012), per¬ 
ception of the functional role of objects (Kaiser, 2014), ac¬ 
tion selection and tool-use (Schillaci et al., 2012) and sense 
of agency (Pitti et al., 2009). 

The work presented here adopts a biologically inspired 
framework for internal body representations (Schillaci et al., 
2014) that can enable a robot with the capability to perform 
simulations of sensorimotor activities based on previous ex¬ 
perience. Inspired by human development, the learning of 
this body representation is intertwined with the interaction 
experience of the robot with the external environment. In 
particular, we frame this work into the context of one of the 
biggest - and most unexplored - challenges of robot audition, 
the artificial capability of listening, that is the presence of 
ego-noise , or the noise that the robot generates while moving 
around. Being able to estimate self-induced changes in the 
auditory signal is not only crucial for attenuating the noise, 
and thus for enhancing the auditory signal for further pro¬ 
cessing such as speech recognition, but also for distinguish¬ 
ing ego-noise from other sounds in natural acoustic envi¬ 
ronments, which is a prerequisite for efficient and intuitive 
interaction with other people and with the surroundings. 

We demonstrate the predictive capabilities of the model in 
the auditory domain in two tasks. Firstly, we introduce the 
framework in a simple classification task, where the robot 
has to recognise a behaviour that it executes based on the 
comparison of the produced ego-noise to internal simula¬ 
tions of ego-noise produced by intended actions. We also 
show how our model can deal with the situation when input 
information are missing, for example by simulating a dam¬ 
age in the system, resulting in the model still being able to 
classify, although with poorer performance. Secondly, we 
show how the proposed framework, and the predictive capa¬ 
bilities that it provides, could serve as a basis for the devel¬ 
opment of a sense of agency in artificial agents. In particular, 
we report an experiment on ego-noise attenuation based on 
sensorimotor predictions, where the quality of the attenua¬ 


tion is dependent on the degree of congruence vs. incongru¬ 
ence between predicted and actual sensory consequences of 
self-generated actions. In line with the behavioural studies 
reported above, we show that prediction errors generated by 
internal sensorimotor simulations are smaller when the pro¬ 
prioceptive information is coherent with the events that are 
perceived from the external environment. Simply put, we 
show that sensory attenuation is more pronounced when the 
robot is the owner of the action, and we argue that this could 
serve as a cue for self-agency in artificial agents. 

In the rest of the paper we firstly introduce the framework 
presented in (Schillaci et al., 2014) and extend it. Therefore, 
we illustrate and discuss the experiments mentioned above. 
Finally, we draw the conclusions and the outlines of future 
work. 

An Internal Body Representation for a 
Humanoid Robot 

Evidences from behavioural sciences and neuroscience sug¬ 
gest that motor and brain development are strongly inter¬ 
twined with the experiential process of exploration, where 
internal body representations would be formed and main¬ 
tained over time (Cang andFeldheim, 2013). Kaas (1997) 
reported the existence of topographic maps in the visual, au¬ 
ditory, olfactory and somatosensory systems, as well as in 
parts of the motor brain areas. Researchers proposed that 
such maps would self-organise throughout the brain devel¬ 
opment and along the sensorimotor experience of the indi¬ 
vidual with the external environment. They would function 
as projections of sensory receptors and of effector systems, 
and are arranged in a way that adjacent regions process spa¬ 
tially close sensory parts of the body. Many studies support 
the existence of an integrated representation of visual, so¬ 
matosensory, and auditory peripersonal space in human and 
non-human primates (see for example Holmes and Spence 
(2004)), suggesting that the brain maintains integrated mul¬ 
timodal representations, which are essential for sensorimo¬ 
tor control (Maravita and Iriki, 2004). 

During the last couple of decades, interest in the possibil¬ 
ity to develop models inspired by the mechanisms of human 
body representations has been growing also in the robotics 
community. In robot audition, for example, Ince and col¬ 
leagues investigated methods for learning, predicting and 
suppressing robot ego-noise (Ince et al., 2009). The authors 
built up an internal body representation of a humanoid robot 
consisting in motor sequences mapped to the recorded motor 
noises and their spectra. This resulted in a large noise tem¬ 
plate database that was then used for ego-noise prediction 
and subtraction. 

Here, we report an implementation of a biologically in¬ 
spired model for body representations that can encode expe¬ 
rience gathered through sensorimotor learning and that can 
generate predictions of auditory and motor states. In par¬ 
ticular, we propose an internal models framework consist- 
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ing of connected neural networks that simulate distinct sen¬ 
sorimotor brain areas. The internal model encodes sensory 
and motor modalities as topographic maps that self-organise 
throughout the interaction of the robotic agent with the ex¬ 
ternal environment. Moreover, a parallel intermodal map¬ 
ping is performed: sensory and motor maps are connected 
through Hebbian links that are strengthened when an occur¬ 
rence of multi-modal activity is observed. 

The model architecture is inspired by the Epigenetic 
Robotics Architecture (Morse et al., 2010), where a struc¬ 
tured association of multiple Self-Organising Maps (SOMs) 
(Kohonen, 1982) is adopted for mapping different sensori¬ 
motor modalities in a humanoid robot, and it is based on 
similar works we previously published (Kajic et al., 2014; 
Schillaci et al., 2014). Self-organising maps have the advan¬ 
tage of producing low-dimensional and discretised represen¬ 
tations of the input space of the training samples. 

In the proposed model, multiple SOMs, each representing 
a sensory or motor modality, are associated through unidi¬ 
rectional Hebbian links: each node of the input map is con¬ 
nected to each node of the output map, where the connection 
is characterised by a weight. The weight is updated accord¬ 
ing to a positive Hebbian rule that simulates synaptic plas¬ 
ticity of the brain: the connection between a pre-synaptic 
neuron (a node in the input map) and a post-synaptic neu¬ 
ron (a node in the output map) increases if the two neurons 
are simultaneously activated. Learning of the internal model 
consists in updating the SOMs and the Hebbian connections 
with sensory and motor data gathered through an exploration 
behaviour executed by the robot. During the execution of the 
robot movements, sensory and motor data are provided as 
training inputs to the corresponding maps in an online fash¬ 
ion. A SOM is constructed as a grid of neurons, where each 
neuron is represented as an n-dimensional weight vector Wi 
(Kajic et al., 2014; Kohonen, 1982). The number of dimen¬ 
sions of a weight vector corresponds to the dimensionality 
of the input data. Weights in the network are initially set 
to random values and then adjusted iteratively by presenting 
the input vector x p . In each iteration, the winning neuron i 
is selected as a neuron whose weights are closest to the input 
vector in terms of the Euclidean distance. After selecting a 
winning neuron, the weights of all neurons are adjusted: 

Awj = 7)(t)h(i, j,t)(wj - x p ) (1) 

The parameter r](t) is a learning rate which defines the 
speed of change. The function h(i, j, t) is a Gaussian neigh¬ 
borhood function defined over the grid of neurons as: 

t) = e v 27r<T(t)2 / (2) 

The learning rate rj(t) and the spread of the Gaussian 
function a(t) are held constant for a certain time interval, 


and are annealed exponentially afterwards. 1 The function is 
centered around the winning neuron i and its values are com¬ 
puted for all neurons j in the grid. The spread of the func¬ 
tion determines the extent to which neighbouring weights of 
a winning neuron are going to be affected in the current iter¬ 
ation. The topology of the network is preserved by pulling 
together neurons towards the winning node. 

After every update of the SOMs, the Hebbian links con¬ 
necting each pair of maps are updated as well. The Hebbian 
update corresponds to the following steps. For mapping an 
input map (e.g. the motor map) to an output map (e.g. the 
auditory map): 

- select the pre-synaptic neuron (winner node) as the closest node 
i in the input map to the current input pattern x (e.g. the joint 
rotation); 

- select the post-synaptic neuron (winner node) as the closest node 
j in the output map to the current output pattern y (e.g. the robot 
ego-noise); 

- strengthen the connection wij between the pre and post-synaptic 
neurons according to the modified positive Hebbian rule: 

A = AA*(x)Aj(y) (3) 

where Ai (x) is the activation function of the neuron i over 
the Euclidean distance between the neural weights and the 
data pattern x, A is a learning rate used for slowing down the 
growth of the weights (in the experiments presented here, it 
is initialised to 0.1). The activation function of a neuron, 
A(d), is computed as: 

A ^ = l + 2tanh(d) (4) 

where d is the normalised Euclidean distance between the 
position of the node and the input pattern. 

After the update, a normalisation is performed on all the 
links from the input map converging to a node in the out¬ 
put map, for each node in the output map, as described by 
Miikkulainen (1990). Such a normalisation implements a 
forgetting process, since it strengthens the updated link and 
it weakens all the other connections. The same process is 
performed on the unidirectional links connecting each pair 
of maps in the model in both directions. 

The trained model can be used for performing sensory and 
motor predictions. Predictive processes can be activated by 
querying the model with partial or full sensorimotor infor¬ 
mation. For example, we can infer the ego-noise produced 
by the execution of a specific motor command (forward pre¬ 
diction) from the model depicted in Figure 2 by querying 
the model with an input to the proprioceptive map, consist¬ 
ing of the joints configuration of the robot, and an input to 
the motor map, consisting of the joints rotations, which are 

l ln the experiments presented in the the following section, we 
set r] — 0.9 and a — 0.7. 
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therefore propagated to the auditory map. In fact, a predic¬ 
tive system based on propagation of signals between maps 
has been implemented. The propagation of signals works as 
follows. Given a sensory or motor input: 

- Find the winner node w and its k neighbours (k set to 5, in the 
experiments presented here) in the corresponding map, as the 
closest node to the input, and calculate its activation using the 
activation function described in (4); 

- Propagate the activation of the nodes in the winners list of the 
input map to all the nodes in the output map connected to it. 
The propagated value to each node in the output map is equal to 
the activation of the selected node in the input map multiplied 
by the weight of the Hebbian link connecting the selected node 
in the input map to the corresponding node in the output map; 
multiple propagations to the same node in the output map are 
summed up; 

- Compute the prediction in the output modality as the weighted 
average of the positions of the nodes in the output map, each 
weighted by the incoming propagation. 

If an observation of the output modality is available, a pre¬ 
diction error can be computed as the distance between the 
predicted outcome and the observation. Moreover, multiple 
propagations can be executed from different input modali¬ 
ties to the same output modality, as illustrated in Figure 2. 
From each input modality, signals can be spread out to the 
desired output modality as described above. Thus, incoming 
propagations onto the output map can be summed up and a 
prediction can be computed as the weighted average of the 
nodes’ positions multiplied with their activations. 



Figure 2: An example of a forward model consisting of three 
maps. 

Figure 2 illustrates a forward model (as described in Fig¬ 
ure 1) implemented using the proposed architecture com¬ 
posed of three SOMs: a proprioceptive map, encoding the 
initial joint configuration of the robot, a motor map, i.e. en¬ 
coding the rotation applied to the joints from the initial po¬ 
sitions, and an auditory map, encoding the noise produced 
by the movements. An inverse model can be implemented 
with two sets of directional Hebbian links: the first starting 
from the proprioceptive map and ending at the motor map, 
and the second starting from the auditory map and ending at 
the motor map. 


Ego-noise representation 

We represent the ego-noise produced by the robot move¬ 
ments using Mel-frequency cepstral coefficients (MFCCs), 
which are features derived from a type of cepstral represen¬ 
tation of the auditory signal commonly used in speech recog¬ 
nition (Sahidullah and Saha, 2012). In this work, MFCC 
features are derived performing the following steps: 

- Calculate the Fourier transform of an audio chunk. In the ex¬ 
periments reported here, we used a single channel audio signal, 
recorded from the robot with a sampling rate of 48 kHz. Audio 
chunks of 40 ms are extracted from the signal using a rectan¬ 
gular window. Chunks are extracted every 20 ms (that is, with 
a 50% overlap between subsequent chunks). FFT size is 2048 
samples. 32 triangular overlapping filters are used in the Mel 
filterbank, with a mel filter width of 200. The frequency range 
of the filterbank goes from 0 to 16 kHz ; 

- Apply the Mel filterbank to the power of the spectrum and sum 
up the energy in each filter; 

- Calculate the Discrete Cosine Transform of the logarithm of the 
filterbank energies; 

- Keep the first 26 or 32 coefficients of the DCT as MFCC fea¬ 
tures. 

For implementing the MFCC feature extraction process, 
we adopted and extended an existing open source and cross¬ 
platform digital signal processing library, named Aquila 
DSP (http://aquila-dsp.org/). Before being processed, input 
data streams are aligned in time, to ensure that the audi¬ 
tory stream matches the actions executed by the robot. We 
use the NAOqi and experimental NAOqi-Modularity frame¬ 
works provided by Aldebaran Robotics, which allow us to 
combine asynchronous data collection and data processing 
using filter chains in the humanoid robot Nao. 

Experiments 

We report here two experiments. Firstly, we present a simple 
classification experiment with the aim of demonstrating the 
learning and predictive capabilities that the proposed model 
can provide to artificial agents. In particular, we adopt the 
proposed framework for allowing a humanoid robot to learn 
the ego-noise that it is producing when performing a mo¬ 
tor behaviour consisting of periodical horizontal head rota¬ 
tions (see Figure 3). Thus, we implement a classification 
experiment where the robot has to classify a behaviour it is 
executing in terms of velocity profile, by comparing the pro¬ 
duced ego-noise to simulations of the ego-noise produced by 
imaginary executions of all the behaviours in the repertoire. 
In addition, we show how the model can deal with the situ¬ 
ation when input information are missing, for example due 
to a damage in the system, resulting in the model still being 
able to classify, although with poorer performance. 

Secondly, we describe an experiment on ego-noise at¬ 
tenuation with the aim of showing that the computational 
processes implemented by our framework resemble those 
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Figure 3: The robot behaviour executed during the recordings consisted in periodical rotations of the head on the yaw axis. 






Figure 4: Example of trajectories of synchronised audio-motor data from the four velocity profiles. The upper plot shows the 
trajectories of the first 4-MFCC coefficients extracted from the single channel auditory signal recorded while executing the 
head rotations. The plot in the bottom shows the head yaw joint position (red line) and the head yaw rotation over 40ms (green 
line). The columns represent the different velocity profiles. From left to right: slow, medium, fast and very fast. 


proposed by the behavioural studies mentioned in the in¬ 
troduction of this work, which would explain the mech¬ 
anisms behind the sense of agency (Weiss et al., 2011; 
Blakemore et al., 2000a). In particular, we report an exper¬ 
iment on ego-noise attenuation based on sensorimotor pre¬ 
dictions, where the quality of the attenuation is dependent 
on the degree of congruence vs. incongruence between pre¬ 
dicted and actual sensory consequences of self-generated ac¬ 
tions. In line with the behavioural studies reported above, 
we show that prediction errors generated by internal sen¬ 
sorimotor simulations are smaller when the proprioceptive 
and motor information are coherent with the events that are 
perceived from the external environment. We reported simi¬ 
lar results in a different robotic experiment in the context of 
visuo-motor coordination (Schillaci et al., 2013). 

Ego-noise classification 

In the first experiment, we trained four different models 
with sensorimotor data gathered while executing a robot be¬ 
haviour consisting of periodical horizontal head rotations 
with four different velocity profiles. We implemented the 
four velocity profiles using the original Aldebaran NAOqi 
controller with gradually increasing velocity thresholds, 
here named as slow, medium, fast and very fast. Figure 
4 shows sample trajectories of aligned auditory and motor 
training data for each of the four velocity profiles. Training 


of the models has been tested online and runs in real-time on 
an Aldebaran Nao v.5 robot. However, the classification re¬ 
sults reported here are taken from models trained and tested 
offline. Sensorimotor data was gathered from the robot exe¬ 
cuting for ca. 200 seconds each of the four velocity profiles, 
resulting in 9449 training samples for the slow velocity pro¬ 
file, 9449 for the medium , 9459 for the fast and 9459 for 
the very fast. Each training sample consisted of the follow¬ 
ing sensorimotor information: S(t), encoding the MFCC fea¬ 
tures extracted from a single audio chunk (see Section ’’Ego- 
noise representation” for more details); S(t-l), encoding the 
initial position of the head yaw joint, that is the closest po¬ 
sition in time to the first audio sample of the MFCC chunk; 
M(t-l), encoding the rotation of the head yaw joint over 40 
ms, from S(t-l). 

Four internal models have been trained with the different 
datasets (slow, medium, fast and very fast velocity profiles). 
Each internal model consisted of three maps (see Figure 2): 
a proprioceptive map, encoding a mono-dimensional feature 
space representing the initial head yaw joint position, that 
is S(t — 1); a motor map, encoding a mono-dimensional 
feature space representing the head yaw joint rotation, that 
is the motor command M(t — 1); an auditory map, encod¬ 
ing a 26-dimensional MFCC feature space representing the 
robot ego-noise. Each internal model encoded both the in¬ 
verse and the forward models, as these are implemented by 
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the Hebbian tables containing the proper directional links, 
as explained in the previous section. Each SOM consisted 
of a 10x10 lattice of nodes, whose weights are randomly ini¬ 
tialised and sampled from a Gaussian distribution J\T( 0,1). 
The weights of the Hebbian links connecting each pair of 
SOMs were all initialised to 0. 

The classification task consisted in feeding the four inter¬ 
nal models (slow, medium, fast and very fast) with test data 
samples gathered from the different datasets which stored 
sensorimotor data produced with each of the four velocity 
profiles, and in comparing the predicted auditory outcome 
with the actual one. Auditory chunks are classified as the ve¬ 
locity profile belonging to the forward model that produced 
the smallest prediction error (calculated as the Euclidean 
distance between the predicted and the observed MFCCs). 
Figure 5 illustrates the classification process using internal 
simulations. 


Observation: 
S(t): Mfcc 



Figure 5: Diagram of the classification process. 

Classification performance was measured for each trained 
model and on 5 different runs (thus, different test datasets). 
Table 1 shows the confusion matrix for the best run, when 
using only forward predictions with full input information. 


Executed 

velocity 



Classified as ; 

# samples 

Slow 

Medium 

Fast 

Very fast 

Slow 

94,00% 

5,50% 

0,50% 

0,00% 

200 

Medium 

6,00% 

89,50% 

4,00% 

0,50% 

200 

Fast 

0,50% 

14,00% 

85,5% 

0,00% 

200 

Very Fast 

4,00% 

4,50% 

7,50% 

84,00% 

200 


Table 1: Confusion matrix showing the performance of the 
classification using only forward predictions. 


We simulated a damage in the system, which was imple¬ 
mented as a lack of proprioceptive and motor information, 
during the predictive process. Internal simulations with par¬ 
tial inputs - in this case, only the auditory modality - were 
performed. The first step consisted in estimating a predic¬ 
tion of the motor command needed to generate the auditory 
outcome specified as input to the model, using an inverse 
prediction. The predicted motor command is fed into the 
corresponding forward model, which anticipates the sensory 


outcome of the intended action. Table 2 shows the confusion 
matrix of the best of 5 classification runs, where we executed 
full internal simulations using only partial information as in¬ 
put. As expected, predictions estimated with missing propri¬ 
oceptive inputs produced a degradation of the classification 
performance. However, the system is still able to classify 
correctly with at least 50% success rate. 


Executed 

velocity 



Classified as 

# samples 

Slow 

Medium 

Fast 

Very fast 

Slow 

85,5% 

11,50% 

2,50% 

0,50% 

200 

Medium 

7,50% 

77,50% 

13,00% 

2,00% 

200 

Fast 

1,00% 

16,00% 

80,0% 

3,00% 

200 

Very Fast 

1,50% 

7,50% 

37,00% 

54,00% 

200 


Table 2: Confusion matrix showing the performance of the 
classification using both the inverse and forward predictions 
with missing input data (proprioceptive joints information). 


Ego-noise attenuation as a cue for sense of agency 

We performed a second experiment on ego-noise attenua¬ 
tion based on ego-noise predictions. In the experiment, we 
simulated that the robot is listening to an ego-noise signal 
(previously recorded from the robot itself) and, in the mean¬ 
while, performing a motor behaviour. Along these move¬ 
ments, a forward model - trained with a periodical head ro¬ 
tation behaviour with slow velocity profile, as in the previ¬ 
ous experiment - was used in executing sensorimotor simu¬ 
lations aimed at predicting the robot ego-noise generated by 
the current motor behaviour of the robot. We tested three 
conditions. In the first one, we simulated that the robot is 
executing a motor behaviour that is coherent with the ob¬ 
served ego-noise. In a second one, we simulated that the 
robot is not moving, thus holding the head in an initial po¬ 
sition (applying a null motor command). In a third condi¬ 
tion, we simulated that the robot is performing a periodical 
head rotation that is not aligned in time with the observed 
ego-noise. In each of the three conditions, we predicted the 
auditory outcomes of the movements by feeding the forward 
model with the joints information corresponding to the cur¬ 
rent motor behaviour. Thus, we subtracted from the original 
auditory information the one of the estimated noise. 

Ego-noise suppression is performed in the log-filterbank 
energy domain. An inverse DCT (Discrete Cosine Trans¬ 
form) is applied to the 32-MFCC feature vectors repre¬ 
senting the predicted and actual ego-noise chunks, produc¬ 
ing two 32-D vectors (log-filterbank energies). Therefore, 
the vector representing the predicted ego-noise is subtracted 
from the one representing the actual ego-noise. In the event 
that the subtraction result in a dimension is negative, spectral 
flooring is applied, that is the attenuated signal is computed 
as the original one multiplied with a factor of 0.1. 

Figure 6 qualitatively illustrates the results of the ego- 
noise attenuation. As evident from the plots, ego-noise at¬ 
tenuation is more pronounced when the input data fed to 
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the forward model is coherent with the auditory output (left 
graphs in the Figure - dark blue colour corresponds to total 
suppression of the ego-noise). The quality of the attenua¬ 
tion is worse, when there is incongruence between predicted 
and actual sensory consequences of self-generated actions, 
as in the case of the second and third condition. In particu¬ 
lar, the second behaviour (head holding an initial position) 
generates a constant ego-noise prediction. The difference 
between the original and predicted ego-noise (bottom row, 
central column) is thus higher than in the case when the mo¬ 
tor behaviour matches the observed ego-noise. The same 
effect is observed in the third condition, where the motor 
behaviour does not match the observed ego-noise. 

In line with the studies reported in the introduction of this 
study, our experiment shows that prediction errors generated 
by sensorimotor simulations are smaller when the propri¬ 
oceptive and motor information are coherent with the per¬ 
ceived ego-noise. Simply put, sensory attenuation is more 
pronounced when the robot is the owner of the action, as 
it has ”a privileged access to internally generated efferent 
information during its own action” (Weiss et al., 2011), as 
simulated in the first condition of this experiment. The sec¬ 
ond and third condition simulated the situation where the 
robot is listening to another artificial agent performing a pe¬ 
riodical horizontal head rotation behaviour, that sounds ex¬ 
actly as it would have been produced by the robot itself. 
However, the fact that the observed proprioceptive and mo¬ 
tor information were incoherent with the observations of the 
ego-noise did constitute an element of surprise, as the for¬ 
ward model fed with such input data produced worse ego- 
noise prediction than in the first condition of the experiment 
- as evident from the bigger prediction errors illustrated in 
Figure 6, bottom plots of the second and third columns. 

Conclusion 

We presented an implementation of a biologically inspired 
model for coding internal body representations that can gen¬ 
erate predictions of auditory and motor experiences. The 
predictive capabilities provided by the models are tested 
in two experiments: a simple ego-noise classification task, 
where we also showed the capabilities of the model to pro¬ 
duce predictions even in the absence of input modalities; an 
ego-noise suppression experiment, where we showed the ef¬ 
fects in the ego-noise prediction, and thus suppression, per¬ 
formance of the input data to the forward model, when they 
are coherent or incoherent with the auditory observations. 
In line with the behavioural studies reported in the introduc¬ 
tion of this paper, our experiment shows that prediction er¬ 
rors generated by sensorimotor simulations are smaller when 
the proprioceptive and motor information are coherent with 
the perceived ego-noise. Simply put, sensory attenuation is 
more pronounced when the robot is the owner of the action. 
When this is not the case, sensory attenuation is worse, as 
the incongruence of the proprioceptive and motor informa¬ 


tion with the perceived ego-noise generates bigger predic¬ 
tion errors, which may constitute an element of surprise for 
the agent and allow it to distinguish between self-generated 
actions and those generated by other individuals. 

In a separate study, we implemented a self-monitoring 
mechanism in a humanoid robot for the prediction and atten¬ 
uation of visually detected consequences of self-generated 
actions. During a training phase consisting of a self¬ 
exploration behaviour, the system trained a forward model 
with motor data and visual data encoding movements de¬ 
tected from the robot camera. Consistently with the study re¬ 
ported here, sensory attenuation resulted to be more promi¬ 
nent in areas in the visual input where movements from the 
robot were expected. Instead, no attenuation was observed 
in areas of the visual input where the movements of an exter¬ 
nal object were detected. Again, this demonstrates that sen¬ 
sory attenuation processes can be adopted as a cue for dis¬ 
tinguishing movements produced by external agents to those 
produced by the agent itself. 

Therefore, we argue that equipping artificial agents with 
internal body representations and with the capability to per¬ 
form sensorimotor predictions based on previous experience 
can represent a promising research direction towards the de¬ 
velopment of a sense of agency in artificial systems. 
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Abstract 

Abstract concepts are rules about relationships such as identity 
or sameness. Instead of learning that specific objects belong to 
specific categories, the abstract concept of same/different 
applies to any objects that an organism might encounter, even if 
those objects have never been seen before. In this paper we 
investigate learning of abstract concepts by computer, in order 
to recognize same/different in novel data never seen before. To 
do so, we integrate recursive self-organizing maps with the data 
they are processing into a single graph to enable a brain-like 
self-adaptive learning system. We perform experiments on 
simple same/different datasets designed to resemble those used 
in animal experiments and then show an example of a practical 
application of same/different learning using the approach. 

Introduction 

Living organisms facing the challenges of survival must 
distinguish food from poison, friend from foe. In many simple 
organisms this may be a simple case of classification or 
categorical perception. For example, it is claimed that an 
earthworm can learn to avoid harmful stimuli by classifying 
the stimuli as harmful after some period of learning and 
avoiding them (Wilson et al., 2014), or a damselfish can 
recognize “faces” (Siebeck et al., 2015). But in more complex 
organisms, there is a need for a more complex form of 
learning: the recognition of abstract concepts. 

Perhaps one of the most fundamental abstract concepts is 
same/different (S/D). Instead of learning that specific objects 
belong to specific categories, the abstract concept of S/D 
applies to any objects that the organisms might encounter, 
even if those objects have never been seen before. Four 
identical apples are the same, just as five identical cars are the 
same. A variety of species of mammals and birds are different, 
just as a variety of colors are different. 

In theory, such a skill might have advantages - perhaps 
detecting differences in a group might indicate the presence of 
a predator, or differences within a nest might detect if eggs 
have been taken or replaced with those of another. However, 
experiments have shown that while organisms such as pigeons 
are capable of being taught the higher-level concepts of 
same/difference, this ability comes far more easily to animals 
with greater intelligence, such as chimpanzees, baboons, 
capuchin and rhesus monkeys (Katz et al., 2007). 


Abstract concept learning is thus considered to form the 
basis of higher order cognition in humans (Katz et al., 2007). 
For many decades researchers have used the ability to judge 
same/different as a core theme in cognitive development, 
cognition, and comparative cognition (Goodman & Melinder, 
2007; Mackintosh, 2000; Shettleworth, 2009; Thompson & 
Oden, 2000). The abstract concept of same/different is also 
thought to be necessary within mathematics, and learning 
language (Marcus et al., 1999; Piaget, 1970). 

In computer science, the majority of research on learning 
focuses on classification or prediction, i.e., learning how to 
categorize or cluster specific types of data, or learning patterns 
and regularities within specific examples of data (Michie et 
al., 1994). Conventional machine learning typically does not 
study how abstract concepts can be learned. 

In this paper we investigate learning of abstract concepts by 
computer, in order to recognize same/different in novel data 
never seen before. The study explores what type of 
information processing is required in order to perform this 
task, evaluating the algorithm with experiments similar to 
those performed with animals, and assesses whether 
same/different can truly be called higher order cognition. We 
also show an example of a more practical application of 
same/different learning using the approach. 

The remainder of the paper is organized as follows. The 
next section describes existing work. The sections after that 
describe the method, experiments, and results. The final 
section provides our conclusions. 

Background 

Abstract Concept Learning 

Abstract concepts are rules about relationships such as identity 
or sameness, and are considered to form the basis of much of 
our so-called higher order cognitive processing (Katz et al., 
2007). Children develop cognition in stages and expand their 
abstract concept of sameness to include number, length, area, 
and volume (Piaget, 1970). In the laboratory, the abstract 
concept of sameness is studied in same/different (S/D) tasks 
where subjects view stimuli and then make one of two 
responses to indicate whether the stimuli are the same or 
different (Katz et al., 2007). The determination of abstract- 
concept learning is accurate performance with novel test 
stimuli, i.e., the subject learned an abstract rule that transcends 
the particular training stimuli. Such transfer performance 
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makes abstract-concept learning unique and different from 
other forms of concept learning (Katz et al., 2007). 

It is important to differentiate abstract concepts from 
natural concepts. Natural concepts are categories of items 
which share specific features, such as cars, chairs, flowers, 
person, water, or trees. In contrast, abstract concepts do not 
involve learning specific stimulus features. Instead, they 
involve learning the relationships between items (Katz et al., 
2007). Thus, abstract concepts involve relational learning as 
opposed to the item-specific learning of natural-concept 
learning. S/D experiments have been run on pigeons (Katz & 
Wright, 2006) and rhesus monkeys (Katz et al., 2002). 

Self-Organizing Maps (SOMs) 

This work uses graph-based SOMs in order to achieve abstract 
concept learning. The SOM is a biologically inspired brain- 
map model (Kohonen, 2013) which is often used for 
visualization of data to obtain a more abstract view (Kohonen, 
1998). It is an automatic data-analysis method resembling the 
classical vector quantization, with the addition that more 
similar models will be associated with nodes that are closer in 
the grid, and less similar models will be situated gradually 
farther away in the grid (Kohonen, 2013). 

SOMs have been used extensively for a variety of 
clustering, classification and visualization applications. For 
example. Merelo et al. (1994) used SOMs to develop a protein 
classification algorithm. Aly et al. (2008) and Lawrence et al. 
(1997) used SOMs for face recognition. Kamimoto (2005) 
used SOMs to evaluate the vibration of motor-operated 
electric tools. Teranishi (2009) used supervised SOM to 
estimate the bending rigidity of real bills using only the 
acoustic energy pattern. (Fatigued bills affect the daily 
operation of automated teller machines). Okada et al. (2009) 
used multiple SOMs for control of a visuo-motor system that 
consists of a redundant manipulator and multiple cameras in 
an unstructured environment. Tateyama et al. (2004) proposed 
a pre-teaching method for reinforcement learning using a 
SOM in order to increase the learning rate using a small 
amount of teaching data generated by a human expert. Amor 
and Rettinger (2005) developed a genetic algorithm that uses 
SOMs to enhance the search strategy and confront genetic 
drift. They found that representing the search history by the 
SOM provides visual insights into the state and course of 
evolution. De Buitleir et al. (2012) created an artificial life 
population where the agents have sufficient intelligence to 
discover patterns in data and to make survival decisions based 
on those patterns using diploid reproduction, Hebbian 
learning, and SOMs. Saunders and Gero (2001) used SOMs in 
their artificial life society of agents, enabling the agents to 
determine the novelty of new artifact without sampling the 
entire design space. 

In this work we propose a recursive SOM, where the results 
from lower level SOMs are fed into higher level SOMs in 
order to enable abstract concept learning. Similar recursive 
SOMs have been proposed in the past, for example SOM 2 
(Furukawa, 2009) and ASSOM (Kohonen, 1996), which use 
lower level SOMs to find the building blocks used by higher 
level SOMs, when visualizing similar data. In this work we 
propose a novel approach whereby datatype-independent 
interpretations or summaries of many distinct and different 
lower level SOM results are fed as input into parent SOMs. 


In our work we also make extensive use of graph databases. 
Graph databases use graph structures for semantic queries 
with nodes, edges and properties to represent and store data 
(Angles & Gutierrez, 2008). SOMs have been used with 
semantic networks in the past with good results (Allinson et 
al., 2001) but in this work we also introduce another novel 
design feature: the SOMs are integrated into the network they 
are processing. By representing both data and the SOM in the 
same graph-based network the system becomes a dynamic 
“brain-like” network that integrates data and learning in the 
same structure, automatically reconfiguring itself when it 
needs to learn new concepts, and providing complete 
provenance for all nodes. 

Method 

Abstract concepts are not absolute concepts. In real life, the 
abstract concept of same/different means “degree of 
difference”. Objects with few or no apparent differences can 
be called “same”. Objects with many distinguishing features 
can be called “different”. Organisms may have a different 
concept of same/different depending on their experiences. For 
example, if we see many different species, then we learn to 
detect the differences between species - a flock of birds is 
different from a herd of buffalo. If we see many animals from 
one species, then we learn to detect the difference between 
individual animals - each buffalo is different from its 
companions. 

Therefore, to achieve the learning of same/different, it is 
not enough to learn the concept from one set of data. One 
must learn from multiple sets of data that some things within 
each set are the same and others within each set are different, 
and the exact meaning of the concept same/different will 
depend on the data sets presented. To achieve this, a recursive 
learning approach is needed: 

1. For each data set: the data is clustered into groups 
containing similar items. 

2. The clustered data sets are then clustered according to 
overall datatype-independent features of each set (in this 
case the number of clusters). 

3. (Optional) Step 2 is repeated for even higher-level 
concept learning. 

After performing second-order clustering, the abstract 
concepts of same and difference can be constructed. Those 
clusters containing data sets with fewer clusters, have items 
more similar to each other. Clusters containing datasets with 
more clusters, have items more different to each other. 

By feeding the results of the learning algorithm back into 
itself in this way, the higher-order, abstract concepts can be 
learned (and indeed ever-higher level, more abstract concepts 
can be learned by recursing further). Uniquely to this work, 
the input data, the SOM grids, and the resulting clusters 
derived from the SOMs, are all stored within the same graph 
structure. In order to achieve this while making every data 
item traceable from high-level concept to input vector, this 
work uses a graph database to store all data, learned models, 
and concepts. Unlike traditional relational databases, graph 
databases enable provenance. The provenance of a data item 
is the lineage of that data item, describing what it is and how it 
came to be. The provenance about a data item includes details 
about processes and input data used to create the item. This is 
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of great benefit when constructing new networks of concepts 
and abstract concepts, as it becomes possible to trace exactly 
which data resulted in which abstract concepts. Graph 
databases also enable fast traversals of large-scale network- 
based data - not something that traditional relational databases 
support well. 

These approaches are also chosen because of their 
similarity to neural systems and their power and scalability to 
large datasets and large numbers of features per data item 
(Kohonen, 2013). The system is implemented in Java and the 
Neo4j graph database 1 is used, also chosen for its ability to 
scale and support high-speed queries. 

In more detail, the method works as follows. 

Build Dataset Graph 

Each dataset is encoded as a graph, see Figure 1. A new 
Vector node is created for each data item in the set, with data 
attributes encoded as a list of numeric properties in each 
Vector. The Dataset node links to each Vector node (with a 
HAS_MEMBER link). The Dataset node also links to a 
Datatype node, containing a type variable with unique value 
for each dataset (with HAS_TYPE link). 


contains a list of properties corresponding to the attributes of 
the data. Each HAS_MODEL link has integer properties 
"row" and "col", which represent the position of the Model 
within the grid. 

First Order Learning 

To classify the Dataset using the graph-based SOM, 
Algorithm 1 is used. Where no existing Models exist for a 
dataset of this type, each SOM is initialized by setting half of 
the attributes in each Model to r/(N-l) and the other half to 
c/(N-l), where (r, c) denote the rectangular coordinate of the 
Model, and N is the size of the grid. This common method of 
initialization was chosen after preliminary experiments on 
standard datasets (e.g., Iris, Car, Wine, and Zoo from the UCI 
Machine Learning repository 2 ) showed it was more effective 
than other alternatives. It also has the advantage of being 
deterministic and fast. Where a dataset of this type has been 
seen before, there will exist Models linked to the Datatype 
node; these are used to initialize the SOM to make use of past 
learning and improve speed. (Preliminary experiments showed 
that the reuse of previously learned Models can reduce the 
number of iterations needed for the SOM to converge.) 



Figure 1. Each dataset is stored in a graph database 
(HAS_MEMBER links are in black, HAS_TYPE in grey). 



Figure 2. 4x4 SOM graph. HAS_MODEL links are shown in 
grey. NEIGHBOUR links are shown in black. 



Build SOM graph 

An SOM graph is created, see Figure 2. The SOM is encoded 
as an SOM node connected to a grid of nxn set of Model 
nodes (using a HAS_MODEL link), that are connected to their 
neighbors (using a NEIGHBOUR link). Each Model node 


Algorithm 1. Learning and abstracting concepts from a dataset 
using the graph-based SOM. 


2 http://archive.ics.uci.edu/ml/ 


1 http://neo4j.com/ 
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Figure 3. A graph-based SOM is trained on normalized vectors from the Dataset. 


The SOM training procedure is taken from (Kohonen, 
2013). On every iteration, each Model value is updated: 


tri, = 2 h jlh . where/, =-'- 

5>A SdUJ) + l 


( 1 ) 


and where: xj denotes the mean of the vectors that are closest 
(according to the Euclidean metric) to the Model j; rij denotes 
the number of those vectors; gd(i, j) denotes the grid distance 
between Models i and j; h is a neighbourhood function 
(Kohonen, 2013). 

The grid distance gd between Models is the distance 
between the cubic coordinates (ax, ay, az) and (bx, by, bz) of 
each Model M and is defined in Eqn. 2. 


gd(M ax, ay ,az' M bx ,by,bz)- 


I ax - bx I +1 ay - by I +1 az - bz I 


( 2 ) 


The cubic coordinate of a Model (x, y, z) is a transformation 
of its rectangular coordinate (r, c) defined in Eqn. 3. 

(x,y,z) = {c-^ ± ^ 1 ,-x-z,r) (3) 


This method to compute the grid distance is chosen to 
optimize performance of a heavily used neighborhood 
function calculation. 

After every Model is updated, the mean energy of the 
iteration is calculated (Eqn. 4). 

(4) 

N 2 

where ra* denotes the Model value before update; m/ denotes 
the Model value after update; d(x, y) denotes the Euclidean 
distance between vectors x and y; N denotes the size of the 
grid, and N 2 thus denotes the number of Models. 

Each SOM is trained for a maximum of 50 iterations or 
until the mean energy E* is less than 10’ 8 . The choice of 
neighborhood function and other values were also chosen 
after performing preliminary experiments on standard UCI 
datasets. Figure 3 illustrates the graph during the SOM 
learning phase. After learning is complete, the clusters are 
identified (see Algorithm 2) and Cluster nodes are linked to 
Models. Any unused Models and all normalized vectors are 
then removed from the SOM graph. This helps reduce the 
space taken for learning, while retaining useful Models in case 
the same type of data is presented again in the future. A 
MetaVector is then created, with a value that equals the 
number of clusters found. Figure 4 illustrates the graph after 
learning is complete and the SOM has been removed. 



Algorithm 2. Finding clusters within the graph-based SOM. 



Figure 4. Similar Models are grouped into Clusters, the SOM 
and its unused Models are discarded, the normalized vectors 
are removed, and a MetaVector is created, summarizing the 
number of clusters in the Dataset, and added to the Meta 
Dataset. (A vector connected to the third Model is shown in 
pale grey for clarity.) 
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Figure 5. A new SOM is created and trained on the normalized MetaVectors from multiple different Datasets, which may comprise 
different types of data. (Graphs linked to each Dataset are not shown for clarity.) 



Figure 6. Similar Models in the SOM are linked to Clusters, 
the SOM and its unused Models are discarded, and the 
Normalized MetaVectors are removed as before. (New links 
between Models and Meta Datasets and Datatypes are not 
shown for clarity.) 

Abstract Concept Learning 

Each time First Order Learning is performed on a new 
Dataset, a new MetaVector node (linked to its corresponding 
Dataset) is created and linked to the Meta Dataset. Abstract 
Concept Learning is then performed by creating a new SOM 
to train on the set of MetaVectors. To perform Abstract 
Concept Learning, Algorithm 1 is used again on this Meta 
Dataset, where T = “meta”, and Vectors are MetaVectors. 
Figure 5 illustrates the learning phase as the SOM is trained 
on the Normalised MetaVectors. Figure 6 illustrates the graph 
after learning is complete and the SOM has been removed. 
The resulting emergent clusters represent abstract concepts 
created - in Figure 6, one cluster represents the concept 
“same” as it points to Datasets containing only single clusters; 
the other concept represents “different” as it points to a 
Dataset containing multiple clusters (the MetaVector value is 
3, representing 3 clusters). The final step of Algorithm 1 is 
optional (its execution would produce an even higher level of 
abstraction). If desired, such recursion could continue until 
everything is grouped into a single cluster. 

Experiments 

Experiment 1 

The first experiment investigates whether the system can learn 
the higher order concept of "sameness" when presented with 


simple data similar to the simple symbols shown to animals 
during S/D experiments, see Table 1 (Katz et al., 2007). Table 
2 shows the six simple sets of data, each set containing no 
value that appears in an earlier set. These sets were presented 
to the system in turn, allowing learning to complete before the 
next set was presented. SOMs were 4x4 in size with settings 
as described earlier. 

Results 1 

The results were fascinating. As each new dataset was 
presented, the first level SOM clustered the data, the clusters 
were identified, and the MetaVectors summarizing the 
clusters were then clustered to produce abstract concepts. 
Initially the first two datasets were clustered together - the 
system had correctly identified that both A and B contain 
items that are the Same. After dataset C was presented, the 
system created a new cluster containing C, representing the 
abstract concept Different. Datasets D and E were also placed 
into the Different cluster, see Figure 7 (Left). However, after 
the final dataset F was presented, the system had gained 
enough insight to reorganize the datasets, finally clustering A 
and B together (Same), C and D (Slightly Different), and E 
and F together (Very Different), see Figure 7 (Right). This 
seems a valid view of the data given that datasets E and F 
contain 10 and 9 unique values whereas C and D only 
comprise 4 values, of which only 1 or 2 are different. 


- - - - 

Same 

Display 

+ u + + 

+ + + + 

Different 

Display 


Table 1. Typical animal experiment data (Katz et al., 2007). 


Data set 

Values 

S/D 

A 

33,33,33,33 

S 

B 

7.1, 7.1, 7.1, 7.1 

S 

C 

102, 6, 102, 102 

D 

D 

31, 1.5,-3, 31 

D 

E 

1,2, 3, 4, 5, 6, 7, 8, 9, 10 

D 

F 

1.0001, 1.0002, 1.0003, 1.0004, 1.0005, 
1.0006, 1.0007, 1.0008, 1.0009 

D 


Table 2. Input data sets for Experiment 1. 
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Figure 7. Left: the graph after presenting the first 5 datasets A to E. MetaVectors are shown as MV. The second-order SOM has 
discovered that there are two types of dataset: A,B (corresponding to Same) and C,D,E (corresponding to Different). Right: the 
graph after presenting the final dataset F. With more experience, the second order SOM now better distinguishes between the 
different types of dataset, by identifying three kinds: A,B (corresponding to Same), C,D (corresponding to Slightly Different) and 
E,F (corresponding to Very Different). 


Experiment 2: Visual data 

In many animal S/D experiments, instead of showing simple 
symbols, actual images of objects such as cars or flowers are 
shown (Katz & Wright, 2006). In the second experiment we 
take inspiration from these experiments and allow the system 
to “watch” an animation in which an event occurs after a 
specific period of time. Here we wish to determine if the 
system can understand when the frames of the animation are 
mostly the Same, discover the point in time when they 
become Different, and understand that new, unseen frames 
after the event may also be considered the Same once again. 
This is a more practical application of the ability to detect S/D 
and could, for example, be used to detect some anomalous 
event occurring in a video feed from a security camera. It is a 
more challenging task as the data presented to each SOM is 
considerably larger compared to Experiment 1. To enable the 
input to be carefully controlled, in Experiment 2 we use an 
8x8 10-frame color animation of the video game character 
Mario, see Figure 9. Each frame was converted to a vector of 
8x8x3 = 192 RGB values. An 8x8 SOM grid was used, and all 
other settings were the same as in Experiment 1. Three 
different scenarios were presented to the system. 

Scenario 1: Jump after 10 frames: 100 frames of animation 
are presented, 5 at a time (e.g., 2-3-4-2-6, 7-2-3-4-2). The first 


10 frames are Mario walking. The next 5 frames depict Mario 
jumping out of the picture. The remaining 85 are black. 

Scenario 2: Jump after 50 frames: 100 frames of animation 
are presented, 5 at a time. After 50 frames of walking, the next 
5 depict Mario jumping, and the remaining 45 are black. 

Scenario 3: Jump after 90 frames: 100 frames of animation 
are presented, 5 at a time. After 90 frames of walking, the next 
5 depict Mario jumping, and the final 5 are black. 

Results 2 

Figure 8 shows a summary of the clusters for each scenario. In 
the fist scenario the frames are considered Same (one meta 
cluster) until the Event on chunk 3 (frames 10-15) when 
Mario jumps. Here the first SOM finds two clusters, resulting 
in the higher level SOM forming a new meta cluster 
corresponding to the abstract concept Different. Following the 
Event, new identical black frames are presented, which are 
correctly classified as Same. This pattern occurs in the other 
two scenarios, except that during the Event the first level 
SOM finds three clusters. In all cases, the system learns that 
most of animation frames in each chunk can be considered the 
Same, but during each Event, one chunk of frames comprises 
Different frames. 
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Figure 8. The number of clusters found by the first level SOM for each 5-frame chunk and meta clusters found so far by the second 
level SOM. The Event (Mario jumping out of shot) occurs during chunk 3 for Scenario 1, chunk 11 for Scenario 2 and chunk 19 for 
Scenario 3. On each Event the system learns and remembers that there are two classes of animation frames: Same and Different. 
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Figure 9. The 10 different frames of the Mario animation, 
comprising the walking sequence of frames: 2-3-4-2-6-7-2- 
3..., blinking sequence: 1-8-1-8-1..., and jumping sequence: 
4-9-10-0-0-0... 



Figure 10. (A) examples of a set of stored Models 
corresponding to the learned abstract concept Different. (B) 
and (C) show two examples of stored Models corresponding 
to the learned abstract concept Same, where (B) is a “hazily 
remembered Mario” and (C) comprises black frames. 


A further test was performed by introducing the blinking 
Mario chunk (frames 1-8-1-8-1) at various times. This chunk 
of new frames could be regarded as Same (since each 
consecutive frames is different by a single pixel) or as 
Different (there are two clear groups of different frames: 1 
and 8). Fascinatingly, the system seemed ambivalent towards 
this chunk of animation. It generally regarded the chunk as 
Same if it was presented after seeing many other Same 
chunks. It regarded the chunk as Different if it was presented 
after seeing the Event, which it considers Different. 

Discussion 

Abstract concept learning is considered to form the basis of 
higher order cognition in humans. This work has shown that 
in a very real sense, the notion of the abstract concept 
Same/Different is indeed higher order. For an SOM to 
discover this datatype-independent abstract concept, the 
output from one SOM must be fed into a higher-level SOM, 
which must learn about the general, datatype-independent 
features (in our case, the number of clusters) found by the 
first. This learning of datatype-independent features is an 
important requirement. Unlike previous work on recursive 
SOMs which build improved views of the same kind of data 
by using lower level SOM clusters as building blocks, here we 
show that it is necessary for the higher level SOM to learn 


about sets of overall interpretations or summaries of the 
datasets derived from the lower-level SOMs. By doing so, the 
system is able to accept radically and completely different 
datasets that share nothing in common, and still determine 
whether each dataset comprises items that should be 
considered largely Same or Different. With enough examples, 
the system can also start to differentiate further, and find 
Same, Slightly Different and Very Different. It should be 
stressed that these abstract concepts are automatically 
generated and they will change depending on what kinds of 
data are presented. Like the S/D abstract concept of living 
creatures, the system here only understands Same and 
Difference in terms of its experience of different data sets, not 
in terms of any absolute comparisons of values. 

The use of a graph database for this approach also enables 
the system to explain its understanding. We are able to query 
the database and ask it for examples of Same or Different that 
it has experienced. Even when the original datasets have been 
removed, the stored SOM Models are able to provide a “hazy 
memory” of its notion of each abstract concept, see Figure 10. 
The connections from each Model via MetaVectors and 
Models to Dataset types enables the tracking of the 
provenance of each Model - it is possible to know which 
dataset resulted in each abstract concept or which “hazy 
memory”. 
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Conclusion 

In this work we have presented a novel recursive SOM, where 
a datatype-independent summary of the output from lower- 
level SOMs that have been applied to different datasets is fed 
into a higher-level SOM in order to learn the abstract concepts 
of Same/Difference. The implementation exploits graph-based 
computing, with data, SOM grid, learned clusters and all 
concepts represented in the same graph database. This 
provides the advantage of provenance for all nodes, enabling 
details about processes and input data used to create the item 
to be found highly efficiently. The graph representation 
reorganizes itself during and after learning, adding new 
concepts as they are discovered, removing nodes when they 
are no longer needed, and reusing stored nodes to improve 
efficiency of learning. The use of the graph database also 
enables this approach to scale. 

The method was tested on simple same/different datasets 
designed to resemble those used in animal experiments and 
then a more practical application of same/different learning 
was investigated - finding anomalous frames within a short 
animation. In all cases the system demonstrated a clear ability 
to learn the datatype-independent abstract concepts of 
Same/Different correctly, and this “skill” was refined and 
improved as new data was presented. 

There are many potential applications for this system, 
where learning of Same/Different is non-trivial. Examples 
include the identification of similar profiles in large databases, 
anomalous events in video streams, or the identification of 
other higher-level concepts such as homographs and 
synonyms. Future work will explore some of these ideas. 

References 

Allinson, N., Yin, H., Allinson, L., & Slack, J. (2001). Advances in self- 
organising maps. Springer, London. 

Aly, S., Sagheer, A., Tsuruta, N., & Taniguchi, R.-i. (2008). Face 
recognition across illumination. Artificial Life and Robotics, 12(1- 
2): 33-37. 

Amor, H. B., & Rettinger, A. (2005). Intelligent exploration for genetic 
algorithms: Using self-organizing maps in evolutionary 

computation. Proceedings of the 7th Annual Conference on Genetic 
and Evolutionary Computation, pages 1531-1538. 

Angles, R., & Gutierrez, C. (2008). Survey of graph database models. 
ACM Computing Surveys, 40(1): 1-39. 

de Buitleir, A., Russell, M., & Daly, M. (2012). Wains: A pattern-seeking 
artificial life species. Artificial Life, 18(4): 399-423. 

Furukawa, T. (2009). SOM of SOMs. Neural Networks, 22(4): 463-478. 

Goodman, G. S., & Melinder, A. (2007). Child witness research and 
forensic interviews of young children: A review. Legal and 
Criminological Psychology, 12(1): 1-19. 

Kamimoto, N., Yamada, Y., Kitamura, M., & Nishikawa, K. (2005). 
Evaluation of vibration in many positions by SOM. Artificial Life 
and Robotics, 9(1): 7-11. 

Katz, J. S., & Wright, A. A. (2006). Same/different abstract-concept 
learning by pigeons. Journal of Experimental Psychology: Animal 
Behavior Processes, 32(1): 80-86. 


Katz, J. S., Wright, A. A., & Bachevalier, J. (2002). Mechanisms of same- 
different abstract-concept learning by rhesus monkeys (Macaca 
mulatta). Journal of Experimental Psychology: Animal Behavior 
Processes, 28(4): 358-368. 

Katz, J. S., Wright, A. A., & Bodily, K. D. (2007). Issues in the 
comparative cognition of abstract-concept learning. Comparative 
Cognition & Behavior Reviews, 2, 79-92. 

Kohonen, T. (1996). Emergence of invariant-feature detectors in the 
adaptive-subspace self-organizing map. Biological Cybernetics, 
75( 4): 281-291. 

Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1): 1- 

6 . 

Kohonen, T. (2013). Essentials of the self-organizing map. Neural 
Networks, 37, 52-65. 

Lawrence, S., Giles, C. L., Tsoi, A. C., & Back, A. D. (1997). Face 
recognition: A convolutional neural-network approach. IEEE 
Transactions on Neural Networks, S(l): 98-113. 

Mackintosh, N. J. (2000). Abstraction and discrimination. In C. Heyes & 
L. Huber (Eds.), The Evolution of Cognition (pp. 123-141). 
Cambridge, MA, US: The MIT Press. 

Marcus, G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule 
learning by seven-month-old infants. Science, 283( 5398): 77-80. 

Merelo, J. J., Andrade, M. A., Prieto, A., & Moran, F. (1994). 
Proteinotopic feature maps. Neurocomputing, 6(4): 443-454. 

Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine 
Learning, Neural and Statistical Classification. Ellis Horwood, 
Upper Saddle River, NJ, USA. 

Okada, N., Qiu, J., Nakamura, K., & Kondo, E. (2009). Multiple self¬ 
organizing maps for a visuo-motor system that uses multiple 
cameras with different fields of view. Artificial Life and Robotics, 
14(2): 114-117. 

Piaget, J. (1970). Science of Education and the Psychology of the Child. 
Trans. D. Coltman. Orion, New York. 

Saunders, R., & Gero, J. S. (2001). A curious design agent. CAADRIA 
(The Association for Computer-Aided Architectural Design 
Research in Asia), pages 345-350. 

Shettleworth, S. J. (2009). Cognition, Evolution, and Behavior. Oxford 
University Press, USA. 

Siebeck, U. E., Parker, A. N., Franz, M. O., & Wallis, G. M. (2015). Face 
discrimination in fish. Behaviour, pages 1. 

Tateyama, T., Kawata, S., & Oguchi, T. (2004). A teaching method using 
a self-organizing map for reinforcement learning. Artificial Life and 
Robotics, 7(4): 193-197. 

Teranishi, M., Omatu, S., & Kosaka, T. (2009). Continuous fatigue level 
estimation for the classification of fatigued bills based on an 
acoustic signal feature by a supervised SOM. Artificial Life and 
Robotics, 13(2): 547-550. 

Thompson, R. K., & Oden, D. L. (2000). Categorical perception and 
conceptual judgments by nonhuman primates: The paleological 
monkey and the analogical ape. Cognitive Science, 24(3): 363-396. 

Wilson, W. J., Ferrara, N. C., Blaker, A. L., & Giddings, C. E. (2014). 
Escape and avoidance learning in the earthworm Eisenia hortensis. 
PeerJ 2:e250, https://doi.org/10.7717/peeri.250 . 


405 




How long did it last? Memorizing interval timings in a simple robotic task 


Julien Hubert and Takashi Ikegami 

The University of Tokyo, Ikegami Laboratory, Tokyo, Japan 
{jhubert,ikeg} @ sacral.c.u-tokyo.ac.jp 


Introduction 

Time perception is the capacity to sense the passing of time, 
but in most living creatures it also involves memorizing how 
much time passed, and eventually acting when it reaches 
a specific amount. The later is referred as interval timing. 
This capacity allows animals to detect temporally repeating 
events in their environment, avoid them if necessary, or ex¬ 
ploit them if beneficial(Saigusa et al., 2008). 

While the research in animals has focused on interval tim¬ 
ing (Connor, 1985; Durstewitz, 2003), research in artificial 
life has limited itself to time perception (Maniadakis et al., 
2014; Trianni, 2008). Indeed, alife models rely on the in¬ 
trinsic temporal properties of neural networks to encode the 
passing of time and, therefore, cannot estimate how much 
time passed since the onset of a stimulus. Our work attempts 
to make one step closer to interval timing by designing an 
agent which must learn the duration of a stimulus, but also 
replay it later on. 


Experimental Setup 

Our task takes place inside an unbounded arena with two 
differently colored circular areas: the Stimulus Area (SA) 
where a stimulus is played, and the Replay Area (RA) where 
the agent must replay the duration of this stimulus. At the 
start of a trial, the agent is placed in SA with a random loca¬ 
tion and a random orientation. After Is, a stimulus is played 
for a duration of either 2s or 4s. The agent is free of its 
movement and can leave SA at any time while the stimu¬ 
lus is played. Once it leaves SA, the stimulus is stopped (if 
still playing), SA disappears, and RA becomes visible. The 
agent must move to RA and remain on it for the duration of 
the initial stimulus. The agent has 30s to complete the task 
before the experiment is terminated. Because the goal and 
the stimulus are never shown simultaneously, the agent must 
maintain the duration in memory until it reaches the goal. 

The agent is a simulated e-puck robot equipped with floor 
sensors, to detect the color of the two areas, and a compass 
indicating the direction of RA when visible. The controller 
of the agent is a CTRNN with 30 neurons (determined ex¬ 
perimentally), including three inputs and two outputs (Beer, 


1995). The first two inputs encode the floor sensors (one 
input for each area), and the last one receives the compass 
value. The outputs encode the speed of the left and right 
motors. 

The parameters of the CTRNN are tuned using a ge¬ 
netic algorithm (GA) with tournament selection, one-point 
crossover and mutation (Holland, 1975). The fitness func¬ 
tion is given by 


fitness 


f os a 


tora 


stimdur + 1 \ stimdur 


sign(stimdur-tora) 


(i) 

where tosa and tora are respectively the time the agent 
spent on the areas SA and RA, and stimdur is the duration 
of the initial stimulation. The first term promotes the capac¬ 
ity to listen to the complete stimulus before leaving SA. It 
is artificially limited to a maximum of 1. The second term 
increases linearly the fitness while the agent remains on RA 
less than stimdur , but decreases it afterward. 

The simulation is physics based and accurately encode 
time with a timestep of 0.01s. Uniform noise is added to 
the position and orientation of the agent, but also on the du¬ 
ration of the initial stimulus (±5% uniform noise). The dis¬ 
tance between SA and RA is fixed, but the random initial 
position of the agent prevents the evolution to exploit it. The 
initial activation values of the neurons in the CTRNN are 
drawn from a uniform random distribution with a maximum 
value of 0.1. 


Results 

Behavior 

Six evolutionary runs of 20.000 generations with a popula¬ 
tion of 100 individuals provided two successful CTRNN for 
this task (fitness > 1.95). While their dynamics are slightly 
different, the evolved strategies are similar. 

Figure 1 shows the position of the two agents during one 
typical run for both durations. Despite individual differ¬ 
ences, both agents remain in SA until the end of the stimulus, 
then exit the area to move toward RA. In RA, the agents set 
up their passage to last the duration of the stimulus. 
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Figure 1: Positions of two agents for durations 2s and 4s. 
The upper circle is SA, and the lower one is RA. 


To determine if the memory can be forgotten, we let the 
experiment run for 1000s and record the behavior of each 
agents. We observed that the agents return to RA quickly 
after leaving it. The duration of each stay within RA is 
shown in figure 2. For both agents, we can observe that 
the initial trial is successful, i.e. each agent remain on RA 
for the expected duration. Later on, the duration on RA in¬ 
creases progressively to stabilize at a fixed value (around 5s 
for agent 1, 4.5s for agent 2). This indicates two properties 
of the evolved agents. First, the memory is not fixed and 
disappears progressively with the time spent exploring the 
environment. Second, each network possesses a natural ten¬ 
dency to remain on RA for a specific duration. Interestingly, 
this duration is none of the ones we evolved the agents with. 
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Figure 2: Time spent in and out of RA for each agent when 
the duration of the experiment is extended to 1000s. The 
inset shows the trajectory of the robot during the experiment. 


Neural Dynamics 

In order to get a better understanding of how the CTRNN 
solves the task, we computed the principal component anal¬ 
ysis (PCA) on the activation of the neurons. The data com¬ 
piled to compute the PCA is from 50 trials with 2s and 50 
trials with 4s. Figure 3 shows the two strongest principal 
components (the 3rd one does not change the trajectory, and 
the others contain little information). In both agents, the dy¬ 
namics while in SA for any duration remains the same. The 
dynamics of the networks for both durations diverge when 
the stimulus disappears, and move toward a similar region 
in the dynamical landscape. Between the end of the stimu¬ 
lus, and the end of the experiment, the state of the network 


moves toward the final attractor which is the end of the trial. 
The important aspect of this analysis is that it does not show 
any fixed attractor connected to the memory of the duration. 
The memory is implemented within the transition toward the 
final attractor. 



Figure 3: PCA of the neural activity of both agents. 


Conclusion 

The two agents presented in this work are capable of evaluat¬ 
ing and memorizing the duration of 2s and 4s stimuli, move 
in their environment and replay this memory when needed. 
Our analyses determined that the agents are not implement¬ 
ing a stable attractor for one duration combined with a natu¬ 
ral tendency for the other one. The memory of the duration 
is encoded through the transient dynamics of the network, 
which allows the network to forget what it learned previ¬ 
ously. From our observations, it also allows the network 
to memorize the small variations of the duration due to the 
noise, indicating it memorizes the duration of the stimuli, 
and not just a cue indicating 2s or 4s. 
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Abstract 

Memory is an essential component of intelligence as it en¬ 
ables an individual to make informed decisions based on past 
experiences. In the context of biological systems, however, 
what selective conditions promote the evolution of memory? 
Given that reliable memory is likely to be associated with 
costs, how much is it actually worth in different contexts? 
We use a genetic algorithm to measure the evolutionary im¬ 
portance of memory in the context of the Iterated Prisoner’s 
Dilemma, a game in which players receive a short-term gain 
for defection, but may obtain greater long-term benefits with 
cooperation. However, cooperation requires trust; cooperat¬ 
ing when an opponent defects is the worst possible outcome. 
Memory allows a player to recall an opponent’s previous ac¬ 
tions to determine how trustworthy that opponent is. While 
a player can earn a high payout by defecting, it will likely 
lose the trust of an opponent with memory, yielding a lower 
long-term payout. We determined the value of memory in the 
Iterated Prisoner’s Dilemma under various conditions. When 
memory is costly, players reduce their available memory and 
use short-term greedy strategies, such as ’’Always Defect”. 
Alternatively, when memory is inexpensive, players use well- 
known cooperative strategies, such as ”Tit-for-Tat”. Our find¬ 
ings indicate that organisms playing against a static opponent 
evolve memory as expected. However, memory is much more 
challenging to evolve in coevolutionary scenarios where its 
value is uneven. 

Introduction 

Biological evolution has produced our only examples thus 
far of general intelligence. As such, understanding the evo¬ 
lutionary process-both how it occurred in nature and how 
we can replicate it in a computer-may prove important on 
the path to developing artificial intelligence. One impor¬ 
tant component of such research is understanding the role of 
memory. Memory is the foundation of learning, allowing an 
individual to alter its future behavior based on prior stimuli 
(Sherry and Schacter, 1987). As such, memory is critical for 
such behaviors as navigating, tracking, foraging, avoiding 
predators, hunting prey and cooperating with others (Dunlap 
and Stephens, 2009; Grabowski et al., 2010; Liverence and 
Franconeri, 2015; Kraines and Kraines, 2000; Soto et al., 
2014). These behaviors are sufficiently beneficial to fitness 


that memory is advantageous to many individuals despite 
the associated biological costs (Barton, 2012; Dukas, 1999; 
May ley, 1996). Understanding the importance of memory 
and the conditions under which memory evolves is crucial 
as it is a fundamental component to both real and artificial 
organisms. 

To study the selective pressures that lead to the early evo¬ 
lution of memory, we need a way to measure their impact 
on memory’s value. Here, we propose a technique for per¬ 
forming a cost-benefit analysis of memory via a simple evo¬ 
lutionary simulation. As an environment for this simulation, 
we will use the the classic game theoretic problem, Iterated 
Prisoner’s Dilemma (IPD). Game theory provides a tractable 
framework for studying the value of memory in social con¬ 
texts. IPD specifically is an ideal choice, because it is well- 
understood, requires memory for optimal performance, and 
is commonly used as a model system for studying coop¬ 
eration (Axelrod, 1987; Crowley et al., 1996; Kraines and 
Kraines, 2000; Golbeck, 2002). In this game, two players 
repeatedly interact; at each step, they may cooperate with or 
defect from each other, and are rewarded according to the 
Prisoner’s Dilemma payout matrix (see Table 1). The fact 
that IPD is so well-studied allows us to thoroughly validate 
this approach to studying memory. At the same time, we can 
gain useful insights into a relatively intuitive system before 
tackling more complex ones. 

To assess the value of memory in this environment, we 
use a genetic algorithm to evolve strategies for playing IPD. 
Strategies in this algorithm are allowed to use memory, but 
at a cost. They must sacrifice part of their payout to have 
and use memory. By imposing a series of different mem¬ 
ory costs and observing under which memory-using strate¬ 
gies evolve, we can measure the value of memory in this 
evolutionary context. Allowing evolution to generate novel 
strategies, rather than hard-coding in well-known strategies 
and allowing them to compete, ensures that we are not in¬ 
advertently introducing our own biases to the study system. 
To further ensure the validity of our system, we initially test 
it in a static environment where all players compete against 
a fixed set of three strategies. Overall, this system should 
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allow the evolution of individuals that use successful strate¬ 
gies in IPD, allowing us to determine the value of memory. 

Iterated Prisoner’s Dilemma 

Three commonly-employed strategies for IPD are Always 
Defect, Always Cooperate, and Tit-for-Tat (Brunauer et al., 
2007). The first two are the repetition of one action (defect 
or cooperate, respectively), while Tit-for-Tat is a strategy 
that repeats whatever action a player’s opponent performed 
last. Always Defect and Always Cooperate do not require 
memory, as they do not rely on the history of a player’s ac¬ 
tions or those of its opponent. Tit-for-Tat, however, does re¬ 
quire memory. In a single iteration of Prisoner’s Dilemma, 
the best possible strategy is Always Defect; regardless of 
the opponent’s decision, defecting will always yield a higher 
payout on a given iteration than cooperating would have (see 
Table 1). This consistent benefit makes Always Defect a 
selfish/greedy strategy (Axelrod, 1987). 
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D 

T = 5 

P= 1 


Table 1: Payouts To Row-Player for Prisoner’s Dilemma 

Fitness is determined based on this matrix. Four payouts 
are possible: Reward (R), Sucker (S), Temptation (T), and 
Punishment (P). These payouts are a result of whether the 
player and the opponent each cooperate (C) or defect (D). 
In a single iteration, T is the highest payout for a single 
player. However, when playing repeated iterations of Pris¬ 
oner’s Dilemma, players can retaliate against each other, 
yielding lower payouts for both than if they had cooperated 
consistently. 

When playing multiple iterations, cooperative strategies, 
such as Tit-for-Tat, outcompete the Always Defect strategy 
by allowing for the higher rewards associated with long-term 
cooperation (Axelrod, 1987; Crowley et al., 1996; Golbeck, 
2002). To be successful, cooperative strategies must, among 
other things, be forgiving and retaliating; both of these at¬ 
tributes require memory (Axelrod, 1987). Forgiving strate¬ 
gies (eventually) cooperate in response to their opponents 
cooperating, even if the opponent defected in the past. Con¬ 
versely, retaliating strategies (eventually) defect in response 
to their opponents defecting. Both of these strategies are 
only possible if the player is able to remember the oppo¬ 
nent’s actions. Thus, we can reasonably expect memory to 
be worth sacrificing some percentage of a player’s payout, 
an assumption which is born out by prior research (Crowley 
et al., 1996). 

Methods 

Our system is a genetic algorithm, where fitness is based 
on the cumulative payout of IPD. A genetic algorithm is a 


Strategy 

AD 

TFT 

R 

Average 

AD (0) 

1.00 

1.06 

3.00 

1.69 

TFT (1) 

0.98 

3.00 

2.24 

2.07 

TTFT (2) 

0.98 

3.00 

2.60 

2.19 


Table 2: Payouts for Optimal Strategies for the Static En¬ 
vironment A player’s payout is determined from the Pris¬ 
oner’s Dilemma matrix (Table 1). In our static environment, 
the player competes against three static strategies: Always 
Defect (AD), Tit-for-Tat (TFT), and Random (R) over 64 it¬ 
erations. The player’s optimal strategy is dependent on the 
size of its memory. The AD strategy uses zero bits of mem¬ 
ory, while the TFT strategy uses one bit of memory. When 
the player has one bit of memory, the best strategy is Tit-for- 
Tat. In this environment, the optimal strategy when a player 
has two bits of memory is to first cooperate and then defect 
any time the opponent has defected in the player’s memory. 
This strategy is called Two-Tits-for-Tat (TTFT). 

method for computationally solving problems that maintains 
and generates a population of potential solutions by select¬ 
ing the most successful ones and allowing them to repro¬ 
duce (Goldberg and Holland, 1988). There are four impor¬ 
tant components within a genetic algorithm: representation 
of a genotype, the initialization of the population, mecha¬ 
nism for selecting the next generation, and mutation oper¬ 
ators (Mitchell, 1996). To facilitate validation of our ap¬ 
proach via comparison to the results of previous research, 
we based our system off of systems that have successfully 
been used to study IPD in the past (Axelrod, 1987; Crowley 
et al., 1996; Kraines and Kraines, 2000). Crowley et al.’s 
set-up was a particularly strong influence, as their system 
allowed for flexible evolution of memory-using strategies. 
Our implementation is open source and available on GitHub: 
https://github.com/mikaelaleas/ChangingEnvironmentGA. 

Representation of Genotype 

Genotypes in our system are closely based off of those used 
by Crowley et al. (1996). An individual’s genotype has three 
components: (1) the amount of memory it uses, (2) the ini¬ 
tial state of its memory, and (3) its decision list. (1) The size 
of an individual’s memory is the number of previous itera¬ 
tions for which it can remember its opponent’s actions (as a 
simplification, organisms are unable to remember their own 
actions). Each bit of memory can hold information about 
one iteration. Since the decision list grows exponentially 
with the amount of memory used, we limit individuals to 
have no more than four bits of memory; that is, individuals 
can remember up to four iterations of their opponent’s ac¬ 
tions. (2) Next, since the memory is supposed to be a list of 
the opponent’s actions, its initial state (before the opponent 
has actually played any iterations) biases the early decisions 
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made by an individual. The initial state of this memory is 
allowed to evolve. The individual’s memory is subsequently 
updated every iteration of IPD, with the oldest past action 
being removed and the most recent action being added (see 
Figure 1). (3) Finally, the decision list is used to specify 
which action an individual will take, given a particular mem¬ 
ory state (see Figure 2). The length of the decision list is 2 n , 
where n is the number of bits of memory, with an entry cor¬ 
responding to every possible state of memory. The initial 
population, of size 500, was composed of individuals with 
each of these components randomly selected. Populations 
were allowed to evolve for 500 generations. 
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Figure 1: Single Iteration of Prisoner’s Dilemma The 

player has three components: size (in bits), initial memory, 
and decision list. In a single round, a player will use the ini¬ 
tial memory and decision list to decide whether to cooperate 
(C) or defect (D). A player’s initial memory is updated ev¬ 
ery round to store the opponent’s last action. The decision 
list does not change during an individual’s lifetime. Here, 
player 1 cooperates with player 2 and player 2 cooperates 
with player 1. Player l’s initial memory is updated to reflect 
player 2’s cooperation. 

Selection of Next Generation 

To select which individuals contribute offspring to the next 
generation: (1) a fitness score is generated for each indi¬ 
vidual, and (2) the population participates in a tournament. 
To determine a fitness score, individuals play 64 iterations 
of Prisoner’s Dilemma (1 game) against competitors. In 
the static environment that we use to validate this approach, 
these competitors have three predetermined strategies: Al¬ 
ways Defect, Tit-for-Tat, and Random. These three strate¬ 
gies were chosen to keep simplicity of the model, allowing 
for a focus on the evolution of memory-using strategies. In 
the coevolutionary environment, these competitors are ran¬ 
domly chosen from the population. Based on the IPD payout 
matrix, each individual is awarded a payout. This payout is 


Initial Memory Decision List 


[ Q ] [ L ] [ 2 ] [ 3 ] 


C 

Figure 2: Initial Memory and the Decision List During a 
single iteration of Prisoner’s Dilemma, a player chooses to 
cooperate (C) or defect (D) based on its decision list. Defect 
is represented with a 0 and cooperate with a 1. In this ex¬ 
ample, the initial memory is CD, which is represented as the 
binary number 10 (i.e. 2, in decimal). This points to index 
2 in the decision list, which contains a C, so this player will 
cooperate in this iteration. 

multiplied by the difference between 1 and the total cost of 
memory (accounting for all of the bits). The result is the 
fitness score. 

fitness = payout(l — cost * size ) (1) 

The fitness score calculation determines how the cost of 
memory affects the fitness of an individual. The cost of 
memory is fixed prior to the experiment. Finally, an average 
fitness for each individual is calculated. The next generation 
is produced through a tournament-style selection. The pop¬ 
ulation is divided into subgroups of 10 individuals. The best 
half of the group-those with the highest fitness scores-are 
selected for the next generation. Note that this is a slightly 
gentler selection scheme than the one used by Crowley et al. 
(1996); we chose it because we felt that the reduced elitism 
was a better analog for the biological systems we are ulti¬ 
mately interested in understanding. 

Mutations 

Mutations occur probabilistically after the next generation 
is selected. There are three classes of mutations that can oc¬ 
cur, corresponding to each of the portions of the genome: 
(1) size mutations, (2) initial memory mutations, and (3) de¬ 
cision mutations. All three types of mutations have a fixed 
probability of 0.01 of occurring when offspring are created. 
(1) A size mutation will increase or decrease the size of an 
individual’s memory by 1 bit. This change affects the length 
of the decision list and the initial memory state of the in¬ 
dividual. If the size of the memory is increased, the deci¬ 
sion list will be duplicated meaning that increasing memory 
has no immediate effect on behavior unless one of the other 
types of mutations also occurs. However, if the size of the 
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Figure 3: Average Number of Bits of Memory Used By 
Cost of Memory Shaded area represents standard deviation 
for each line. The cost of memory had a strong impact on 
the average number of bits of memory used by the popula¬ 
tion (Kruskal-wallis test, chi-squared = 168.44, df =8, p < 
.0001). When the cost of memory increases, the average 
number of bits of memory decreases (Post-hoc Wilcoxon 
Rank-sum test with Bonferonni correction). The average 
number of bits used at each memory cost are consistent with 
the predicted values from Table 3. 

memory is decreased, the decision list is halved by remov¬ 
ing the least significant bits (most distant in the past memory 
position). (2) The memory mutation affects the initial state 
of memory. This mutation will randomly choose an index 
of the initial memory and toggle the action (cooperate or de¬ 
fect) at that position. (3) The decision mutation targets the 
decision list. This mutation will randomly choose an index 
of the decision list and toggle the action (cooperate or de¬ 
fect) at that position. 

Results and Discussion 
Static Environment 

To verify this system’s efficacy, we started out by allowing 
strategies to evolve in a static environment in which each 
player competed against three static strategies: Always De¬ 
fect, Tit-for-Tat, and Random. In this scenario, we can de¬ 
terministically calculate how much a bit of memory should 
be worth in each context. The expected fitness and the high¬ 
est memory cost for which players evolve to use memory 
is calculated from the Prisoner’s Dilemma payout matrix, 
the individual’s size, and the memory cost. The individual 
plays 64 iterations of IPD against each of the three strate¬ 
gies and receives payouts accordingly. The payouts are then 
adjusted according to the individual’s size and the cost of 
memory, to determine the individual’s fitness (Equation 1). 
Using more bits of memory allows the player to recall more 
previous actions of the opponent and thus determine which 


Strategy 

Cost 

AD 

TFT 

R 

Average 

AD (0) 

0.01 

1.00 

1.06 

3.00 

1.69 

TFT (1) 

0.01 

0.97 

2.97 

2.22 

2.05 

TTFT (2) 

0.01 

0.96 

2.94 

2.55 

2.15 

AD (0) 

0.075 

1.00 

1.06 

3.00 

1.69 

TFT (1) 

0.075 

0.91 

2.78 

2.07 

1.92 

TTFT (2) 

0.075 

0.83 

2.55 

2.21 

1.86 

AD (0) 

0.2 

1.00 

1.06 

3.00 

1.69 

TFT (1) 

0.2 

0.78 

2.40 

1.79 

1.65 

TTFT (2) 

0.2 

0.59 

1.80 

1.56 

1.32 


Table 3: Expected Average Fitness by Cost of Memory 
in the Static Environment This table shows the expected 
average payout per iteration for the optimal strategies for 0, 
1, and 2 bits of memory, adjusted by various costs of mem¬ 
ory. The parenthetical next to each strategy name denotes 
the number of bits of memory that it uses. Here, we show 
three costs, each of which favors a different strategy: Al¬ 
ways Defect, Tit-for-Tat, or Two-Tits-for-Tat. 

strategy the opponent is using. Once an individual is able 
to determine its opponent’s strategy, it may alter its future 
actions to increase its payout. This enables the evolution of 
better strategies that are able to retaliate against opponents 
if exploited. For example, an individual using the Always 
Defect strategy receives an average payout per iteration of 
1.69 (see Table 2). If the cost of memory were 0.01 and 
the individual had one bit of memory, that payout would be 
reduced to 1.67. Using two bits of memory would further 
decrease the payout to 1.65. When there is no fitness cost, 
the optimal strategy is to start out cooperating, use the max¬ 
imum allowed amount of memory, and defect any time an 
opponent has defected within memory. This will result in an 
individual always defecting after the first iteration against 
Always Defect, cooperating with Tit-for-Tat, and recogniz¬ 
ing Random as frequently as possible. However, there are 
diminishing returns to adding additional bits of memory (see 
Table 2); in this simple setup, the greatest fitness improve¬ 
ment comes from adding the first bit, making Tit-for-Tat a 
possible strategy. 

When a cost is applied to memory, the optimal strategy 
may change (see Table 3). If our system is accurately mea¬ 
suring the value of memory, we would expect to see Always 
Defect be the dominant strategy when the cost per bit of 
memory is 0.18 or greater, Tit-for-Tat be dominant when 
the cost is between 0.18 and .065, and so on. This result 
is almost exactly what we see in practice (see Figure 3). As 
predicted, this shift seems to be driven by an increase in Tit- 
for-Tat-style strategies as the cost of memory decreases (see 
Figure 4). 

The one slightly unexpected result is that, even when 
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Figure 4: Most Common Strategies By Cost of Memory 

We calculated the most commonly used (dominant) strat¬ 
egy in each of the 20 replicates within each of the 5 mem¬ 
ory cost conditions. The four dominant strategies we ob¬ 
served were 3TFT (Three-Tits-for-Tat, the optimal strategy 
with three bits of memory), 2TFT (Two-Tits-for-Tat, the op¬ 
timal strategy with two bits of memory), TFT (the optimal 
strategy with one bit of memory), and AD (the optimal strat¬ 
egy with no memory). As expected, the dominant strategy 
depended on the cost of memory. Increasing the cost of 
memory increases the frequency with which less memory¬ 
intensive strategies are dominant. 

memory has no cost, strategies don’t tend to use much more 
than three bits of memory. We hypothesize that this is due to 
the following mechanism: Every additional bit of memory 
doubles the size of an individual’s decision list. An exces¬ 
sively large decision list is at increased risk of experiencing 
genetic drift away from the optimal values. Thus, the poten¬ 
tial fitness gain from adding a fourth bit of memory may not 
be worth the increased risk of the lineage making incorrect 
moves later on. Such a scenario would be consistent with 
the decreased recognition accuracy found by Crowley et al. 
(1996). 

Coevolutionary environment 

Having demonstrated that our methodology accurately mea¬ 
sures the value of memory in a system, we can now move 
on to a more interesting case. Instead of placing solutions in 
a static environment, we can allow them to compete against 
each other. This scenario introduces complex coevolution¬ 
ary dynamics that would normally confound attempts to 
measure the value of memory. In this setup, the population 
is initially populated with Tit-for-Tat (one bit of memory) 
and each individual plays IPD with each other individual in 
its tournament to determine its fitness. Like before, the top- 
half of each tournament is allowed to reproduce. We ran this 
treatment at two different mutation rates: low (.01 for each 
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Figure 5: Memory Usage in Coevolutionary Environment 
(Low Mutation Rate) Shaded area represents standard de¬ 
viation for each line. Memory use consistently evolved only 
when memory had no cost; the average amount of memory 
used in this condition was significantly different from the 
amount used in all of the other conditions (Kruskal-wallis 
test and post-hoc Wilcoxon rank-sum test with Bonferonni 
correction, chi-squared = 55.93, df =5, p < .0001). In all 
of the other conditions, the average amount of memory used 
gradually declines over time. 

mutation type) and high (.1 for each mutation type). 

At a low mutation rate, memory proves far less useful in 
this more complex environment, as evidenced by the fact 
that it is not consistently used if it has any cost associated 
with it (see Figure 5). As in the previous experiment, in¬ 
creasing the memory cost increases the percentage of repli¬ 
cates in which Always Defect, rather than Tit-for-Tat, be¬ 
comes the dominant strategy. When examining individual 
runs, a common pattern takes place. The initial population 
of Tit-for-Tat is frequently invaded by Always Cooperate. 
Always Cooperate can displace Tit-for-Tat (in the absence 
of other competitors) because it receives the same payout, 
but does not have to pay any cost for memory. Once Tit- 
for-Tat is extinct (or nearly so), Always Defect arises and 
quickly displaces Always Cooperate. In the low mutation 
rate replicates, Tit-for-Tat rarely is generated via mutation 
from Always Defect, leading to a stable population that is 
trapped at a sub-optimal strategy. Although Crowley et al. 
did not analyze the strategies that evolved in their system, 
these results are consistent with theirs in that they too ob¬ 
served that applying a cost to memory resulted in decreased 
cooperation (Crowley et al., 1996). 

Interestingly, memory use in a coevolutionary context in¬ 
creases at the higher mutation rate (see Figure 6). When 
memory is free in this treatment, strategies quickly evolve 
to use the maximum allowed amount of memory, suggesting 
that the implicit costs of making use of a large memory are 
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Figure 6: Memory Usage in Coevolutionary Environment 
(High Mutation Rate) Shaded area represents standard de¬ 
viation for each line. Populations evolved to use more mem¬ 
ory at lower costs. At memory costs of .075 and higher, the 
average amount of memory used by the population after 500 
generations was not significantly different from 0 (Kruskal- 
wallis test and post-hoc Wilcoxon rank-sum test with Bon- 
feronni correction, chi-squared = 95.24, df =5, p < .0001). 

overwhelmed by coevolutionary selective pressures. Alter¬ 
natively, the large decision lists that individuals with a lot of 
memory have may serve to increase mutational robustness. 
This effect would be in contrast to the results observed in the 
static environment and in previous research (Crowley et al., 
1996). Understanding the relationship between these factors 
would be an interesting direction to explore in the future. 

In the condition with no memory cost, Tit-for-Tat is the 
most common strategy in approximately half of the repli¬ 
cates, a finding which is consistent with Tit-for-Tat’s domi¬ 
nance in the Axelrod tournament (Axelrod, 1987). Among 
the other half of the replicates there is an incredible diver¬ 
sity of most common strategies - only two of the other repli¬ 
cates have the same most common strategy. Applying any 
cost to memory causes the population to converge to well- 
known strategies (see Figure 7). These results align with 
May ley’s finding that applying a cost to learning (analogous 
to memory, in our case) substantially inhibits the exploration 
of strategies that would require it (Mayley, 1996). 

Conclusion 

We demonstrated the evolutionary value of memory by us¬ 
ing a genetic algorithm that awards fitness based on the re¬ 
sults of many iterations of the Iterated Prisoner’s Dilemma. 
Under static environmental conditions, the population often 
evolved to use memory, despite it being costly, as long as it 
provided a substantial gain in payout. In fact, the extent to 
which memory was used aligned nearly perfectly with the¬ 
oretical predictions about the costs and benefits of memory 
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Figure 7: Most Common Strategies in Coevolutionary 
Environment (High Mutation Rate) Again, Tit-for-Tat is 
more frequently the dominant strategy at lower memory 
costs. Note that this figure does not include the strategies 
used when there was no cost to memory, because there were 
too many of them. Approximately half of the replicates in 
the 0 cost condition of this treatment used Tit-for-Tat, and 
the other half each had a different dominant strategy (al¬ 
though most of the dominant strategies were not dramati¬ 
cally more prevalent than other strategies in the population). 
Also note that the strategy on the far left, 0011~ 11 is de¬ 
noted only by its genotype (decision list^initial memory), 
as it does not correspond to a well-known named strategy. 
It cooperates initially, and any time its opponent cooperated 
two iterations ago. 

in this system. This result demonstrates that the technique 
proposed here is an effective way to quantify the value of 
memory in evolutionary contexts. By simply giving mem¬ 
ory a fitness cost and observing whether memory evolves 
we can assess its importance in complex scenarios. 

In more dynamic environments, we observed that mem¬ 
ory was valuable when there were no costs because it en¬ 
abled cooperation. However, it was easily evolved away un¬ 
der high memory costs (where Always Defect could rapidly 
overtake Tit-for-Tat) or low mutation rates (where Always 
Cooperate could outcompete Tit-for-Tat and subsequently 
be outcompeted by Always Defect). While this phenomenon 
illustrates the difficulty of measuring the value of memory in 
an environment where that value keeps changing, our results 
were consistent with the findings of prior research and we 
were able to more fully investigate the mechanisms behind 
them. 

While we were able to show the value of a single bit of 
memory, the evolutionary dynamics explored here generally 
did not provide a substantial benefit to having larger amounts 
of memory. In light of these early findings, we plan to ex¬ 
tend this research, both in static environments (to test our 
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analytical predictions of the value of memory) and in dy¬ 
namic coevolutionary environments (to study the practical 
evolution of memory in realistic scenarios). 

For static environments, we plan to explore the evolu¬ 
tionary response of players to imperfect opponents, such as 
those that attempt to engage in Tit-for-Tat, but make occa¬ 
sional errors. A single mistake can spiral into a high level 
of defection and much lower overall payouts, but if a player 
uses larger amounts of memory, it will be able to recognize 
and forgive mistakes for a longer-term benefit. We will also 
explore introducing longer-term memory that the player can 
set as it chooses. We will provide these players with com¬ 
binations of opponents that require long-term memory to re¬ 
ceive optimal payouts, such as Always Cooperate and Tit- 
for-Tat. In such cases, a player with long-term memory will 
be able to initially probe to determine whether its opponent 
responds negatively to a defection. If so, it can play Tit-for- 
Tat from then on (starting with a cooperation). On the other 
hand, if the opponent does not retaliate, the player knows 
that it can play Always Defect from then on out for a larger 
payout. 

Dynamic environments have an even wider potential for 
helping us learn more about the evolution of memory. As 
of now, it is challenging to evolve cooperative strategies de 
novo. They require memory to increase-immediately incur¬ 
ring a cost-but no gain is realized until a cooperative strategy 
is in place and multiple players are using it and interacting. 
We plan to explore structured populations with smaller, local 
groups where kin selection effects can dominate and selec¬ 
tion is weaker, allowing these strategies to more easily come 
into play. We plan to also explore more stabilizing forces 
once players are engaging in cooperation so that it doesn’t 
evolve away as easily as we saw here. 

Overall, this work is an important step in studying the 
early evolution of memory utilization, and insights from it 
are likely to be valuable in informing other real and artificial 
life studies involving the evolution of intelligence. 
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Abstract 

In both social systems and ecosystems there is a need to resolve 
potential conflicts between the interests of individuals and the 
collective interest of the community. The collective interests 
need to survive the turbulent dynamics of social and ecological 
interactions. To see how different systems with different sets of 
interactions have varying degrees of robustness, we need to 
look at their different contingent histories. We analyse abstract 
Artificial Life models of such systems, and note that some 
prominent examples rely on explicitly a-historical frameworks; 
we point out where analyses that ignore a contingent historical 
context can be fatally flawed. Real life studies highlight the role 
of history, and Artificial Life studies should do likewise. 

Introduction 

In both ecosystems and social systems there are at least two 
levels at which, speaking loosely, ‘lifelike’ processes can be 
observed. There is one level at which the individual 
organisms, animals, humans are interacting with each other 
and pursuing their individual interests. But also there is a 
second level of ecosystem organisation, or social organisation, 
which provides the context within which they exist. In 
principle the same individuals could function, perhaps more or 
less successfully, if the ecosystem/social organisation was 
changed. One extreme version of such a change would be for 
the ecosystem/social organisation to break down in chaos, 
which is often against the interests of the individuals 
concerned. Systems survive or die, just as individuals do. 

Ecosystems versus Social Systems. Social organisation can 
be the outcome of a social contract where individuals have 
chosen to agree to a set of rules. Ecosystems do not involve 
such explicit choice. Regardless of such differences, in both 
cases one can analyse individual behaviour in terms of self- 
interest potentially clashing with the interests of others 
around. In social systems we may call some actions ‘cheating’ 
and some consequences ‘punishment’. In ecosystems we tend 
to avoid such moral overtones and merely discuss ‘actions’ 
and ‘consequences’; the analyses may nevertheless be similar. 

How do they Persist? If a specific ecosystem/social system 
survives for a long time, explanation is called for. If no 
external authority is responsible for imposing this, then the 
organisation must be an emergent consequence of individual 
patterns of behaviour that are globally somewhat resilient to 
the perturbations of everyday life. We may ask how one 
specific ecosystem/social system manages to persist, or we 
may ask about generic properties needed for persistence. 

What is their Origin? Each specific ecosystem/social system 
will have its own unique history, from origins up to the 
present day; just as each organism has its unique genetic and 



Real Life 

Artificial Life 

Social 

systems 

Bitcoin 

(Nakamoto, 2008) 
Common pool 
resources 

(0 strom, 1990) 

Iterated Prisoner’s Dilemma 
(Press and Dyson, 2012) 
(Stewart and Plotkin, 2013) 

Natural 

systems 

Ecosystems 

Niche 

construction 
(Clements, 1916) 
(Lewontin, 1969) 

Daisyworld 

(Watson & Lovelock, 1983) 
(Harvey, 2015) 

Complex systems 
(May, 1972) 


Table 1: Classes of decentralised social systems and natural 
(eco-)systems and their Alife counterparts analysed here. 


developmental history. It is the main thesis of this paper that 
generic theories, that gloss over or average such specific 
histories, often fail to capture salient features of reality. 
Examples of such theories will be criticised. 

Real Life. We consider both real systems and their artificial 
life counterparts, as in different columns of Table 1. Amongst 
real social systems we look at Bitcoin as a money transfer 
system, and common pool resource governance as studied by 
economists. Natural systems refer here to ecosystems as 
studied by ecologists in the field. It will be suggested that 
those studying such real life systems will have no problems 
agreeing with the thesis that history matters. Hence this paper 
is mainly targeted at those producing abstract models that 
explicitly leave out any consideration of history. 

Artificial Life Models. Simple abstract models of social 
systems are illustrated here by examples from IPD, Iterated 
Prisoner’s Dilemma. This is based on a classic two-person 
game where each player has simple choices and the 
interactions between them have consequences in terms of 
different payoffs. The basic dilemma of individual cheating 
versus cooperation is distilled into this simplest form. 
Ecosystem models discussed here include Daisyworld models 
where the organisms and environmental influences are 
characterised as variables interacting in a dynamical system. 

Mathematical Summary. If processes are actually non- 
Markovian, modelling them as Markovian will lead to error. 

Plain Language Summary. Real life takes place in a world of 
accumulated historical accidents that affect how social and 
ecological processes actually function. History matters. 
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Artificial Life Models of Governance 


Real Social Systems 


The Introduction to ‘Leviathan’ (Hobbes, 1651) gives the first 
known reference to artificial life under that name: 

NATURE (the art whereby God hath made and governes 
the world) is by the art of man, as in many other things, 
so in this also imitated, that it can make an Artificial 
Animal. For seeing life is but a motion of Limbs, the 
beginning whereof is in some principall part within, why 
may we not say that all Automata (Engines that move 
themselves by springs and wheels as doth a watch) have 
an artificiall life? 

This introduces the metaphor of a nation state as an artificial 
man, Leviathan, with different components functioning 
together as mechanically deterministic parts but forming a 
living whole. What sort of governance can provide some form 
of global harmony ensuring cooperation and collaboration 
between component parts and reconciling any potential 
conflicts between them? How does some form of social 
contract arise (and continue to survive) from a natural state of 
anarchy? Hobbes’ answer was for central rule by an absolute 
sovereign. Though such a sovereign is ultimately driven by 
his private interests, these are aligned with the public interests 
in so far as “The riches, power, and honour of a monarch arise 
only from the riches, strength, and reputation of his subjects”. 

Here we follow Hobbes’ surprisingly modern notion of 
artificial life, and the use of models such as Automata in the 
study of real life governance. However we part company with 
him on his assumption of a need for a central sovereign. 
Leviathan is Hobbes’ exemplar of central authority, but in all 
further examples we discuss below there is no central 
authority, no rules for behaviour are imposed from outside. 

Distributed Social Systems, Choice, Attractors 

When governance arises solely through interactions between 
individual participants, different styles of governance can only 
arise through the different choices they make. Strategies any 
one individual has for choosing are typically conditional in 
part on the choices the others have made. These individual 
choices bind into a social system when there is a stable pattern 
that persists despite potential disturbances from within or 
without. In dynamical systems terms, we are looking for the 
attractors of such systems. There may be many possible such 
attractors, some more congenial than others to the participants 
— e.g. with higher payoffs in utilitarian terms. 

Ecosystems, Choice, Attractors 

With natural ecosystems we may not be considering the same 
type of explicit strategies or choices seen in social systems. 
Nevertheless, a different type of ‘choice’ is available, a choice 
of where to locate, which environment to inhabit. Animals 
may move from valley to hilltop; even plants, over 
generations, can shift their habitat. In this subtly different 
sense of choice, the component members of an ecosystem 
have ‘chosen’ to coexist in a specific locale where their 
various interactions (including their own knock-on effects on 
the environment) allow them to thrive. In the theoretical space 
of all conceivable ecosystems, there is a multitude of such 
viable and robust locales that act as potential attractors. 


We focus here on two classes of real social systems with 
distributed governance, chosen to illustrate different facets of 
dependence on history: Bitcoin and common pool resource 
systems. Such systems are only stable if they are indeed 
currently near an attractor, which is another way of saying that 
they recover from small disturbances. We look at how such 
systems may adapt to changing circumstances over the longer 
term, and hence the role of historical contingency in how they 
come to be in one attractor rather than another. 

Bitcoin 

In commerce there needs to be common agreement about 
who paid what to whom; this is often centrally regulated by 
banks maintaining records. One version of this governance 
problem is that of verifying money transfers over the internet, 
and a very different style of solution is provided by Bitcoin 
(Nakamoto, 2008). Here the maintenance of book-keeping 
records is distributed, not centralised. The protocol used has to 
reconcile private interests with public interests; an individual 
would (dishonestly) benefit by spending the same money 
twice, but a money transfer system only works if such double¬ 
spending is prevented. Roughly speaking, this replaces trust in 
a sovereign central bank with trust in a majority consensus of 
multiple independent record-keepers distributed across the 
internet. This may be compared with a simple natural 
biological example where consensus amongst bacteria can be 
achieved via ‘quorum sensing’ (Miller and Bassler, 2001). 


DNA 


(a) 


os*** 

0 *** I 




Blockchain 


•a**'" 
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Figure 1: (a) DNA has statistical continuity over 
phylogenetic history, with noise. Older and newer data both 
matter, (b) Blockchain is built up systematically with new 
blocks added at the end, verified by consensus via ‘key¬ 
finding’ for each addition. 


Blockchains, DNA and Costs. In Bitcoin the official record 
of all transactions in recorded history is maintained in a data 
object called a blockchain. Somewhat like DNA, this is a 
linear string of digits, meaning it is virtually free and 
instantaneous to copy. Multiple copies can be distributed 
widely. Like DNA, it can grow incrementally over time. 
Unlike DNA, the blockchain of accounting records cannot 
mutate or have parts excised; the protocol has to maintain 
accuracy and integrity across all copies of the blockchain as it 
is updated with new transactions (Figure 1). New transactions 
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are bundled together into a block to be added on to the end of 
the blockchain; then a deliberately lengthy and 
computationally expensive process is undertaken by each 
record-keeper to find a ‘key’ to validate it. This 
cryptographically-based key must identify both the old 
(mutually agreed) transaction history and the new block of 
transactions. Different record-keepers may have different 
updates to add (i.e. different new blocks), but the protocol 
must ensure agreement on just one of these as authoritative. 

With DNA, the stored data is expensive to accumulate. 
Evolution has selected for what is transmitted and preserved, 
at the expense of numerous other versions that were selected 
against. With blockchains there is also expense, but arising 
differently in the key-finding exercise; the need for this 
expense arises from the need to avoid cheating. From a 
dynamical systems perspective, both with blockchains and 
with DNA (within a single species) there is maintenance of a 
steady state far from equilibrium, a steady state that preserves 
the information across multiple copies. But this is a 
metastable steady state with possibilities for change and 
growth, as the history is not only maintained but added to. 

Though blockchain technology is not near cost-free (as 
some mistakenly think) and has weaknesses discussed below, 
its subtle use of history makes it a powerful tool with uses that 
go beyond the financial record-keeping of Bitcoin. 

Common Pool Resources 

Bitcoin has distributed, not centralised control. At a different 
scale, practical working examples of decentralised control can 
be seen in societies across the world ranging from water 
authorities in California to shared forest usage in Nepal and 
Switzerland and shared fisheries in Turkey. These are 
maintained and policed by the participants themselves rather 
than imposed by some external sovereign authority. Such 
‘common pool resources’ have been the focus of economist 
Ostrom (1990). She proposes a list of design principles or 
‘best practices’ that are common to such robust institutions: 

(1) Clear identified boundaries between those people and 
resources within the institution and those outside. 

(2) Appropriation rules congruent to local social and 
environmental conditions. 

(3) All (or most) members share in making or changing rules. 

(4) People who are users (or accountable to them) monitor the 
appropriation and resource management. 

(5) Sanctions for rule violations are graduated from low to 
high according to the severity or persistence of violations. 

(6) Conflict-resolution mechanisms are local and rapid. 

(7) External authority, e.g. higher government, does not 
enforce its own rule contrary to that of the local institution. 

(8) Where there are multiple levels of governance they are 
organised in multiple nested layers. 

In such common-pool scenarios, unlike Bitcoin, 
anonymous entry or participation is not possible. The potential 
for the system to adapt itself according to changing local 
circumstances further differentiates this from what may be a 
serious weakness of Bitcoin. All participants have not only a 
stake in maintaining the rules (principles 4 and 6) but also in 
changing them (principle 3). Such adaptation in the 
governance system needs to be congruent with local social 
and environmental conditions (principle 2); and the social 
conditions may include further higher or lower level layers of 


governance, overlapping in a nested fashion (principle 8). 
Within the generic constraints of these 8 principles there is 
scope for a multitude of possible governance systems each 
adapted, more or less, to local circumstances and fashioned 
through a historical succession of contingencies. 

ALife Models of Social Systems: IPD 

We move from stability, contingency, history in real systems 
to the same issues in ALife models. Recent innovations in IPD 
(Iterated Prisoner’s Dilemma) provide a case study. 

Motivation for IPD Models. These provide a minimal model 
of 2 agents (‘prisoners’) interacting. They must decide on 
actions independently, but the payoff to each depends on what 
they both decide, and is designed to provide a conflict 
between individual and collective gains. 

The supposed story is that they have agreed beforehand to 
deny everything about some joint crime, but now they are 
interviewed separately by the police. Each has to decide 
whether to keep quiet as promised (‘Cooperate’ with the other 
prisoner) or make some deal with the police (‘Defect’). In 
terms of utility, they both receive R (say 3) if both Cooperate; 
both receive P (1) if both Defect; and if one Defects, the other 
Cooperates the payout is T (5) to the defector and S (0) to the 
other. The choice of (T, R, P, S) = (5, 3, 1,0) (Figure 2a, 
following Press and Dyson, 2012) meets the PD condition 
T>R>P>S that implies whatever agent 2’s decision is, agent 1 
would gain more by Defecting than Cooperating. The further 
condition 2R>T+S implies the total payout for both 
Cooperating, 2R, is higher than the total payment when one 
Cooperates and the other Defects. 

The rules treat each agent symmetrically, so any difference 
in outcome depends solely on how their strategies interact. In 
a single game with no further consequences, each agent 
maximises their payoff by Defecting, irrespective of the other 
agent’s choice. Hence they both Defect (D), receiving 1 each, 
whereas if both Cooperated (C) each would have received 3. 

If such games are iterated indefinitely, in the IPD, then each 
agent’s actions may influence future responses. Under some 
circumstances a regime of Cooperation for mutual benefit can 
arise; IPD studies usually focus on just what conditions allow 
this and discourage cheats (i.e. Defectors). Such conditions 
provide counter-examples to Hobbes’ intuition that only a 
sovereign authority can guarantee a mutual Commonwealth. 

Tit for Tat, TFT. A typical class of IPD strategy depends 
(either deterministically or probabilistically) on memory of 
the previous choices made by each agent in the previous N 
rounds, N>1. Tit for Tat (TFT), for example, is the memory-1 
strategy where an agent copies the action that the other agent 
took in the previous round (Axelrod, 1984; Figure 2b). Tit-for- 
Two-Tats is the memory-2 strategy where an agent only 
defects if the opponent defects twice in a row. More generally 
a memory-N strategy can be specified as a table with 4 A N 
cells, relating to 4 possibilities CC, CD, DC, DD (for own 
+opponent choices) on N previous rounds, each cell 
specifying the probability that C will be chosen by this agent 
in the new round. For example, TFT with memory 1 has these 
probabilities of Cooperating, dependent on the previous 
round: CC 100%, CD 0%, DC 100%, DD 0%. 
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Figure 2: (a) The IPD payoff table (Press and Dyson, 2012). 
(b) Tit-for-Tat players differentiate into TFTc or TFTd 
(opening play C or D). Different varieties meeting (upper 
square) lead to 3 different end-attractors, average scores of 
(3,3), (1,1) and (2.5,2.5) (i.e. average of (5,0) and (0,5), lower 
square). The weighted (25%, 25%, 50%) average of all these 
attractor scores is different yet again, (2.25,2.25). 

Historical and A-Historical Agents 

Such memory-1 strategies depend explicitly on short-term 
memory of the previous round; but they also depend on long¬ 
term history of starting conditions, since the very first move 
makes a difference, say C for TFTc or D for TFTd. There are 
two possible routes to finesse this issue, the first being to 
acknowledge that TFTc and TFTd are indeed two different 
strategies with different consequences (Figure 2b). Only when 
TFTc meets another TFTc does the virtual circle of 
Cooperation take off. If both are instead TFTd then a vicious 
circle of Defection takes over. A TFTc meeting TFTd results 
in alternating CD, DC choices. The starting conditions have a 
permanent effect on which basin of attraction is entered. 

A second way to finesse this issue is to arrange affairs so 
that initial conditions eventually become irrelevant, and this 
could be the case with sufficient noise in the system. If with 
high enough probability a choice is accidentally reversed, then 
over enough iterations of IPD all possible basins of attraction 
will be visited. In a classic early Alife paper (Lindgren, 1991) 
explicitly used this method. There is a cost to be paid for 
finessing matters this way, however: TFTc and TFTd are now 
indistinguishable in such a theory, despite the fact that over 
any finite run they typically have totally different behaviours. 

In principle the IPD game iterates for an arbitrary number 
of rounds, not known in advance. If both players know it to be 
the final game, this becomes a one-shot PD where both must 
rationally Defect. In turn, the penultimate game falls to the 
same analysis, and so on back to the first. An infinite series of 
rounds avoids this trap, but is impossible in practice. But we 
can have a finite, non-predetermined, number of rounds by 
arranging after each iteration a small (e.g. 1%) chance that it 
is then deemed to be the last. Then if any noise (as introduced 
by Lindgren) is small in comparison to this 1%, strategies 
such as TFTc and TFTd will be visibly seen to operate in 
different basins of attraction. Real world scenarios typically 
resemble this pattern rather than the infinite-iteration limit. 
For such real world scenarios, the history will matter. 

This distinction between historical and a-historical agents is 
the central focus of this paper. Behaviour of the latter depends 


only on recent short-term events held in ‘memory’, whereas 
the former also depends on one-off longterm origins in 
history. Crudely, this can be related to different perspectives 
from Biology and Physics. Typically many biologists are 
interested in a specific species or ecosystems with a specific 
evolutionary history (which we can relate to TFTc or TFTd). 
In contrast physicists, broadly speaking, may be happier 
making broad generalisations across some arbitrary range of 
entities (which we can relate to Lindgren’s TFT); often this 
makes the mathematics more tractable. Taken to extremes, this 
can result in broad statements that are generically true about 
“all possible organisms” assuming ergodicity, thus including 
extant organisms on this planet together with all extinct 
organisms, and indeed all conceivable organisms on all 
conceivable planets; but nevertheless misleading about any 
one specific non-ergodic organism. What is true about generic 
a-historical IPD agent TFT can be false about TFTc or TFTd. 

Press and Dyson 

A recent ground-breaking IPD paper (Press and Dyson, 2012), 
displayed a novel class of memory-1 ZD (Zero Determinant) 
strategies. These allow an agent — provided it no longer had 
the simple ambition to maximise its own payout that 
traditionally is expected in IPD — to tailor its strategy to 
guarantee that the opponent’s payout will average some value 
such as 1.5 (between P and R) regardless of how the opponent 
responds. Or such an extorting agent can guarantee that the 
excess of payoff above P will be shared in unequal 
proportions such as 3:1. The details of these ZD strategies are 
not discussed here. They are highly novel and counter¬ 
intuitive and are acknowledged by others to be valid, given 
the context; but many of the conclusions Press and Dyson 
drew have been shown to be misplaced (Stewart and Plotkin, 
2013). We summarise these points, then go even further in 
questioning the validity of their Markovian assumptions.. 

Extortionate ZD Strategies. Suppose agent_X chooses an 
extortionate ZD strategy that gains a bigger proportion of the 
excess rewards (above a base-level of P) regardless of 
agent_Y’s responses. Then if agent_Y is an optimising player 
that adjusts strategy so as to increase its own payoff (Press 
and Dyson call this an evolutionary player) the result is that 
agent_X scores even higher. The erroneous implication Press 
and Dyson draw is that in an evolutionary scenario where 
multiple strategies are competing against each other, such 
extortionate strategies will triumph and dominate. As Stewart 
and Plotkin (2013) and other commentators point out, this is 
not so. If extortionate players came to dominate an 
evolutionary scenario, they will typically be competing with 
similar extortionate strategies. If agent_X and agent_Y are 
both forcing their excess payout (above P=l) to be 3 times 
greater than their opponents, this is neatly resolved by the 
excess being 0 for each, the (1,1) score of mutual Defection. 

Generous ZD Strategies. It turns out that so-called Generous 
ZD strategies — that roughly speaking do the opposite of 
extortion in making sure that differential benefits mostly 
accrue to their opponents — will dominate in an evolutionary 
scenario. Such Generous strategies behave optimally against 
other Generous strategies, and also replace non-cooperative 
ZD strategies (Stewart and Plotkin, 2013). 
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Such ZD Strategies Ignore Historical Contingency 

The main contribution of this paper to this novel development 
in IPD studies is to point out what other commentators have 
apparently missed: this whole class of ZD strategies, whether 
extortionate or generous, has been set up to be a-historical and 
hence to be largely irrelevant as models of human (or animal) 
strategies; since these are typically historical, contingent and 
contextual. Press and Dyson (2012) explicitly set up their ZD 
strategies to use the same finesse Lindgren (1991) uses, as 
discussed above, to average over all possible contingent 
longterm histories; they focus on generic strategies dependent 
on short-term memory alone. Indeed, they go further than 
Lindgren in showing that such Markovian assumptions allow 
any memory-N strategy to be generically equivalent to (some 
other) minimal memory-1 strategy. 

Their proof covers the TFT strategy averaged over all 
possible histories, but fails cover a TFTc strategy, even with 
its short history of a single first move. A fortiori , such IPD 
results have even less relevance to the real world when e.g. 
analysing the mating behaviour of this specific butterfly, with 
its long evolutionary and ecological history of multiple over¬ 
lapping constraints as context; or when analysing the 
governance system for that Turkish communal fishing 
arrangement, with its long social and cultural history of 
multiple over-lapping poly centric social contracts. Ostrom 
(1990) explicitly mentions congruence with local social and 
environmental conditions among her design principles 
observed in long-lasting common pool governance systems, 
and this historical contingent contextually is what is stripped 
away in such generic mathematical proofs. Mathematically, 
one cannot analyse non-Markovian processes as if they were 
Markovian. 

Real Ecosystems 

We now consider the systems in the lower row of Table 1, 
starting with a minimal overview of real ecosystems. 

Ecological Succession. This is the observed process of 
change in structure of an ecological community over the 
medium to long term. For instance after a mass extinction a 
typical sequence is for a few species of plants and animals to 
initially return; then successive new organisms arrive, 
building on what is already there in what Ostrom might want 
to call multiple nested polycentric layers in analogy to her 
social systems. In some cases this may be a somewhat 
predictable succession towards a final ‘climax 
community’ (Clements, 1916); but more recent ideas tend to 
take account of the many historical contingencies involved, 
including the varied feedbacks through knock-on 
environmental effects, and see a more unpredictable picture of 
‘alternative stable states’ (Lewontin, 1969). In the short-term 
an ecosystem is in a stable steady state, but in the longer term 
it is somewhat accidental which one of many such possible 
equilibria it is, and what range of fellow organisms it contains. 

Niche Construction. Such theories emphasise that organisms 
may not be merely accepting or selecting (through moving to) 
their specific environment; they may also have an active role 
in changing it (Laland and Sterelny, 2006). 



Figure 3: No-feedback scenario: environmental perturbation 
S (solar output) directly affects local env. T (temperature) 
which directly affects organism D (daisies), (a) D assumed to 
have steady-state dependency, ‘hat-shaped’ function of T, 
giving limited zone of viability, (b) D-viability (binary yes/ 
no) plotted against perturbation S (here scaled to match T). 

ALife Models of Ecosystems 

Daisy world (DW) models (Watson and Lovelock, 1983; 
Harvey, 2015) offer a simplified vision of how organisms and 
environment interact in some sense cooperatively. This can be 
compared to a very basic form of niche construction. 

Motivation for Daisyworld Models. These are not widely 
known, and where known largely misunderstood (Harvey, 
2015). The rationale is to model a number of types of 
organisms (e.g. one being ‘daisies’ D) that can survive within 
a limited range of local environmental conditions (e.g. one 
being ‘temperature’ T). Collective survival of an ecosystem of 
different organisms means all of them are currently viable in 
their local environment; robustness of an ecosystem is 
measured in terms of how wide a range of perturbing 
environmental conditions it can survive; e.g. an external ‘sun’ 
S creating hotter/colder conditions. An organism may have 
some local environmental effect (e.g. the albedo of a black 
daisy may raise local temperature), and complexity is 
measured as the number of such different effects within the 
ecosystem. The key DW result is that more such complexity 
leads to greater ecosystem robustness. 

We demonstrate this, starting from the simplest ecosystem 
with a single species; for equations see Appendix A. Figure 3 
shows schematically the basic influence of environment T on 
an organism D. Figure 4 shows the consequence of further 
adding an effect from the organism D onto the environmental 
variable T. The consequence is to extend, i.e. widen the range 
of solar forcing (perturbing external effect S) for which the 
organism is viable (Harvey, 2015). Here the effect is positive 
(e.g. the albedo of a black daisy increases local temperature) 
the solar viability range is extended towards lower values 
than otherwise. A negative effect (e.g. white daisies tend to 
reflect heat and decrease temperature) would extend the solar 
viability range towards higher values. 

Plus and Minus, Rein Control. Further, if both variants are 
potentially available with both positive and negative effects 
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Figure 4: As Fig. 3 plus a further influence of D on T (here 
positive, black daisies increase temp.) (a) Peak response of D 
to changes in S is shifted, with a hysteresis loop, (b) D- 
viability zone is extended (here to the left) by a buffer zone, 
only effective if entered from high S values, and not low ones. 


Figure 5: As Fig. 4 plus white daisies Dw affect negatively 
their local temp Tw as well as black daisies Db affecting 
positively Tb. (a) shows steady-state values of each D (b) 
shows viability of Db&Dw (simultaneously viable), against S. 


on the local environmental variable, temperature, they will 
collectively expand their joint eco-niche, as seen in Figure 5. 
This phenomenon depends on some basic assumptions spelt 
out in Appendix A; each variant, black or white, largely 
determines its own local temperature but with some ‘leakage’ 
between them in their shared environment. In this model, 
interactions between different ‘species’ such as Db and Dware 
only mediated via environmental variables, rather than 
through e.g. direct predation of one on the other. The results 
here, and developed further in Harvey (2015), demonstrate 
that any changes in viability range (for Db&Dw, or Db, Dw 
individually) always increase the range and never decrease it. 

The expanded viability range takes the form of hysteresis 
loops as in Figure 5b. If the external perturbing force, here S, 
changes slowly, then which of the upper (viable) or lower 
(non-viable) arms of such loops is followed depends on which 
direction they are approached. In this sense history matters. 

This is an example of ‘rein control’ (Clynes, 1969; Harvey, 
2004). Clones observed a pattern when natural organisms 
exhibit homeostasis in response to external forces threatening 
viability both from above and below (e.g. both ‘too hot’ and 
‘too cold’). Rather than one mechanism responding in two 
directions, he noted two mechanisms each responding in one 
direction only. Since reins of a horse have this same property, 
each pulls but does not push, he called this ‘rein control’. 

This is further related to Le Chatelier’s principle (Le 
Chatelier and Boudouard, 1898) as known to chemists and 
economists. This principle asserts that when any system in 
equilibrium is disturbed the system will adjust itself so as to 
(at least partially) nullify the effect of the change. A practical 
application of this principle is the use of a buffer solution 
which resists changes in pH when acid or alkali is added. 
These can be designed by chemists (Scorpio, 2000), or seen 
naturally where the bicarbonate buffering system regulates pH 
of blood in humans or other animals (Krieg et al., 2014). 

Multidimensional Daisyworld 

So far we only considered one environmental variable at a 
time: say temperature in DW, pH in the buffering examples. 


What if two or more such variables are simultaneously 
relevant, e.g. both temperature and pH? 

We can answer this within any very simple, abstract class 
of ecosystem models where (any number of) ‘organisms’ are 
modelled by ‘hat-shaped’ viability functions of (any number 
of) environmental variables; and in turn the organisms have 
any effect of any kind, positive or negative, on each or all of 
the environmental variables. In such cases it has been shown 
in the ‘Gaian Regulation Theorem’ (Harvey, 2015) that 
hysteresis loops or buffer zones as illustrated above exist 
regardless of the dimensionality of any such system. 
Perturbations in any number of dimensions will tend to be 
countered so as to widen — and never lessen — the viability 
range of any disparate group of organisms in an ecosystem, or 
of individuals in a corresponding social system. 

As an abstract example, Figure 6a shows 8 groups of 8 
species in clusters of narrow preferences for 3 environmental 
variables. In the absence of DW feedbacks at most one such 
group could be viable since the small viability spheres do not 
intersect (only P, V spheres shown here). If we add DW 
effects, different for all 8 members within each group, then 
when an external perturbation happens to pass the 
neighbourhood the whole group of 8 becomes jointly viable 
with a viability radius greatly expanded (from 0.05 to 0.218 
for effect size 0.4; details in Appendix B). The expanded 
viability spheres (as V-sphere in Figure 6b) may now overlap 
and (depending on environmental history) several such groups 
may become simultaneously viable. If the effect size were 
increased to 0.8, the mid-value perturbation C (0.5,0.5,0.5) 
would be within all 8 potentially expanded spheres, allowing 
all 64 (8x8) species with diverse environmental limits and 
diverse effects to be simultaneously viable. 

This proof of principle still only has 3 dimensions of 
environmental variables, and is symmetrically set up to 
demonstrate an effect. Real systems will typically have more 
dimensions and be highly asymmetrical and locally varied, 
with convoluted overlaps of basins of attraction. Nevertheless 
we can see that different perturbation trajectories may result in 
very different ecosystems. Trajectories matter, history matters. 

Such meandering paths through ecosystem-space can be 
compared with meandering evolutionary paths through DNA- 
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Figure 6. (a) 3 dimensions of external env. perturbations. P is 
group of 8 species, preferred env. (0.4,0.4.0.4), viable within 
radius 0.05 of this as shown by P sphere. Similar groups 
centred on Q, R...W, at comers of cube. Arrows show a 
possible trajectory of external perturbation, (b) This passes 
through viability zone of group V, DW effect consequently 
expands its viability radius to 0.218. See Appendix B. 


space, and some degree of resemblance is not entirely 
accidental. From a high-level perspective the viability 
functions of DW can be related to the survival focus of 
Darwinian evolution. The natural settlement into attractors of 
the broad class of dynamical systems that is multidimensional 
DW relates directly to the natural selection of Darwinian 
evolution. Indeed the latter may be seen as a special case of 
the former. They both have surprising and counter-intuitive 
consequences; for instance an increase in an effect that 
increases the range of DW robustness usually decreases 
Darwinian fitness (Harvey, 2015). 

Where an A-Historical Analysis Differs 

The analysis of ecosystems in terms of DW, as presented here, 
is controversial (Harvey, 2015). One influential analysis (May, 
1972) of an even broader class of ‘any large complex 
system’ (that includes multidimensional DW) purports to 
contradict it, proposing that, after some critical number of 
variables is exceeded, such systems are inherently unstable. 
Three mathematical flaws in this analysis have been 
previously exposed (Harvey, 2011). We here go further in 
identifying these flaws as arising from an a-historical analysis 
that resembles the a-historical analysis of IPD (Press and 
Dyson, 2012; Stewart and Plotkin, 2013). 

May (1972) picks out an arbitrary equilibrium point of a 
large complex system and analyses its properties. This 
arbitrary choice, together with other explicit or implicit 
assumptions he makes, allows one to draw general 
conclusions; as the system gets larger, the chance that this 
specific equilibrium is stable tends towards the vanishingly 
small. This part of May’s argument resembles creationists’ 
arguments about the improbability of the ‘irreducibly 
complex’. But the fact that the probability of an arbitrary 
lottery ticket being a winner becomes arbitrarily small as the 
lottery itself gets arbitrarily big does not stop there being a 
winning ticket, or indeed many such. 

A dynamical system left to its own devices will naturally 
head towards a stable equilibrium, such a ‘winning ticket’; 
any unstable equilibrium will only be briefly observed. As 
external conditions change, such a system inevitably passes 
through a sequence of metastable states. Thus any observed 
equilibrium is almost inevitably a stable one; which 


equilibrium it is depends on the history of the system. May’s 
analysis of a generic a-historical equilibrium state has little 
relevance for the analysis of specific, observed, historically 
contingent equilibria (Harvey, 2011). Likewise the analysis by 
Press and Dyson (2012) of extortionate ZD strategies for IPD, 
or of Stewart and Plotkin (2013) of generous ZD strategies, 
has little relevance for historical contingent strategies such as 
TFT C or TFTd. 


Conclusions 

Crudely speaking, biology equals physics (and chemistry) 
plus history — stability in the short term plus the contingent 
context arising from an extended history of stability. More 
elegantly put, “Biology has always occupied a middle ground 
between the determinism of classical physics and the 
uncertainties of history” (Smith and Morowitz, 1982). When 
the physics of short-term stability is the focus of attention to 
the exclusion of contingent history, key concerns that can 
characterise complex systems can be missed. 

It may be more than a coincidence that Press, Dyson, 
Stewart, Plotkin and May, variously cited and criticised above, 
all come from physics backgrounds. Another physicist, 
Rutherford (Birks, 1962), is quoted as saying “All science is 
either physics or stamp-collecting”. If the latter is interpreted 
as contingency, it need not be taken as derogatory; this is not 
only important for understanding real biology and social 
science but equally so for Artificial Life models of these. 

In biological systems internal DNA is one obvious marker 
of a history, but other external markers may also be crucial. In 
poly centric social contracts (Ostrom, 1990) there may be 
multiple overlapping simultaneous systems of governance; 
likewise in polycentric organisms, polycentric ecosystems. 
Adaptations (and neutral changes) in any one system layer are 
within (and constrained by) the contingent current context of 
the others. Complexity of the whole arises through such 
adaptive/neutral trajectories through history, and cannot be 
explained a-historically. 

A specific novel observation in this paper, apparently not 
noted by other commentators, is that the recently discovered 
extortionate ZD strategies in IPD (Press and Dyson, 2012) , 
together with their generous cousin strategies (Stewart and 
Plotkin, 2013), have very little relevance to any biological or 
social studies of cooperation because they are all avowedly a- 
historical. Their Markovian assumptions are mathematically 
powerful but implausible as models of reality. The same 
applies to May’s (1972) analysis of large complex systems. 

In passing we have noted that the blockchain of Bitcoin in 
its present form cleverly maintains the global history of 
transactions, and the full history is needed to establish the 
current state of accounts; in this sense the blockchain is 
historical. However the institutional framework of Bitcoin 
currently has no mechanism for adaptive change as per 
Ostrom’s principle 3; Bitcoin itself is a-historical. 

Successful real social systems and ecosystems have a 
history of adapting to circumstances, and this gives context to 
their current stability. Artificial Life models should reflect 
this, and there are currently many promising research areas 
that give scope for developing currently deficient a-historical 
models to take account of such contingency. History matters. 
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Appendix A 

Figure 5 shows ‘black’ and ‘white’ daisies, Db and Dw, and 
respective local temperatures Tb, Tw (Harvey, 2004). Figure 4, 
using Db only, is similar except that Dw is clamped to 0. 

Daisy viability w.r.t. local temperature is based on a ‘hat¬ 
shaped’ function H(T) with (Figure 3) peak value 1.0 at T op t 
reducing to zero outside some limited viability range. Results 
are not qualitatively changed by different hat shapes. 

(1) H(T) = max(0, 1 - abs(T op t - aT)) 

Parameter a sets slope of hat. hence radius (=l/a) of daisy- 
viability in terms of its local temperature. Parameter p sets the 
rate at which daisy-viability moves towards the hat-function: 

(2) dD B /dt = p (H(T b ) -Db) 

(3) dDw/dt = p (H(Tw) -Dw) 

The local temperature Tb, of black daisies Db is based on the 
solar insolation S, altered (i) by positive influence from the 
black daisies, and (ii) by equilibration towards Tw. Tw is 
conversely affected, white daisies have negative effect. On the 
assumption that temperatures settle faster than rate of change 
of Daisies we can use the steady-state values as in (Harvey, 
2004). Using T' for intermediate values of T, phase (i) is: 

(4) Tb = S + y Db 

(5) T'w = S - y D w 

where y parameterises the effect size for black/white daisies 
increasing/decreasing their own local temperatures. Phase (ii) 
gives the final temperature T as a compromise between each 
individual T' and their average current values; there is some 
‘leakage’ (Harvey, 2004), here parameterised via 5 (for 
0<5<1), between temperatures of black and white daisies: 

(6) Tb = 5 Tb + (1- 8)( T'b + T'w)) 

(7) T w = 5 T'w + (1- 5)( T'b + T'w)) 

If we choose 5=0.5, then algebraic manipulation shows that 
equations (4,5) together with (6,7) can be replaced by: 

(8) Tb = S + s(3Db-Dw) 

(9) T w = S + 8 (Db - 3 Dw) 

where for convenience we substitute s (= y/4) for parameter y. 

Equations (1), (2,3) and (8,9) can be simulated 
computationally by choosing some specific value for S, and 
running these equations from starting values for D, T, until 
steady-state is reached. In hysteresis regions, the end-states 
reached will depend on the starting states. To plot one branch 
of each hysteresis loop, S should be initialised at a low value, 
and the computation run until D, T reach steady-state. Then S 
is incremented slightly, keeping current values of D, T as new 
starting values for the next run; this is further repeated, 
through to high values of S. If the process is then reversed, 
moving from high S to low S, the other branches of the 
hysteresis loops can be plotted. In Figure 5b, the viability of 
Db+w is plotted as: IF (Db> 0 AND Dw>0) plot 1, ELSE plot 0. 

Appendix B 

Figure 8 shows 3 dimensions of external env. perturbations. 
Viability of group of 8 species at P is 1.0 at (0.4,0.4,0.4), 
decreasing linearly to 0.0 at radius (Euclidean distance) 0.05. 
Each species has different +/- effects on 3 respective env. 
variables (2 A 3 = 8 variants); signs differ, but effect size is 
always 0.4. The other 7 groups (Q,...,W) are formed similarly. 

Effects of a P-species are multiplied by their viability and 
have two local contributions: half serves to shift the P-group 


local env. away from the perturbing force (and is thus shared 
with other P-members; ‘leakage’); and half shifts the species- 
specific env. away from the P-local env. Over a trajectory of 
env. perturbations, at each point 20,000 computational 
iterations altered viability by 0.001 and local env. variables by 
0.005 of their indicated shift. This smoothing of dynamics, 
together with the inheritance of previous env. values as 
perturbations changed, avoided numerical instabilities. A 
species was considered extinct if viabilitycO.Ol. 

An effect size 0.4 expanded viability radius of each group 
from 0.05 to 0.218; effect size 0.8 expanded it further to 0.35. 


References 

Axelrod, R., (1984). The evolution of cooperation. Basic Books, NY. 

Birks, J. B. (1962). Rutherford at Manchester. Heywood, London. 

Clements, F. E., (1916). Plant succession; an analysis of the development 
of vegetation. Carnegie Institute of Washington. 

Clynes, M., (1969). Cybernetic implications of rein control in perceptual 
and conceptual organisation. Ann. NY Acad. Sci. 156:629-670. 

Harvey, I., (2004). Homeostasis and rein control: from Daisy world to 
active perception. In Pollack, J., Bedau, M., Husbands, P., Ikegami, 
T. and Watson, R. A. (Eds.), Proc. 9th Int. Conf. on Sim. and Syn. of 
Living Systems, A LIFE 9. MIT Press, Cambridge, MA. 

Harvey, I., (2011). Opening stable doors: complexity and stability in 
nonlinear systems. In Lenearts, T. et al., (Eds.), Advances in 
Artificial Life, ECAL 2011, pp 805-812, MIT Press. 

Harvey, I., (2015). The circular logic of Gaia: fragility and fallacies, 
regulation and proofs. In Andrews, P., Caves, D., Doursat, R., 
Hickinbotham, S., Polack, F., Stepney, S., Taylor, T. and Timmis, J. 
(Eds.), Proc. Eur. Conf. on Artificial Life 2015, MIT Press. 

Hobbes, T., (1651). Leviathan. Andrew Crooks (publisher), at the Green 
Dragon in St. Pauls Church-yard, London. 

Krieg, B. J., Taghavi, S. M., Amidon, G. L., Amidon, G. E., (2014). In 
vivo predictive dissolution: transport analysis of the C02, 
Bicarbonate in vivo buffer system. J.Pharm. Sc. 103(11):3473-3490. 

Laland, K. N. and Sterelny, K., (2006). Perspective: seven reasons (not) to 
neglect niche construction. Evolution, 60(9), 1751-1762. 

Le Chatelier, H. and Boudouard, O., (1898). Limits of flammability of 
gaseous mixtures. Bull, de la Soc. Chim. de France, 19:483-488. 

Lewontin, R. C., (1969). The meaning of stability. Brookhaven Symposia 
in Biology, 22:13-23. 

Lindgren, K., (1991). Evolutionary phenomena in simple dynamics. In 
Farmer, J. D., Rasmussen, S. and Taylor, C., (Eds.), Artificial Life II. 
Edison-Wesley, Redwood City, CA. 

May, R. M., (1972). Will a large complex system be stable? Nature 238, 
413-415. 

Miller, M. B. and Bassler, B. L., (2001). Quorum sensing in bacteria. 
Annu. Rev. Microbiol. 55:165-199. 

Nakamoto, S., (2008). Bitcoin, an electronic peer-to-peer cash system. 
Url: https://bitcoin.org/bitcoin.pdf 

Ostrom, E., (1990). Governing the Commons: the evolution of institutions 
for collective action. Cambridge University Press. 

Ostrom, E., Walker, J. and Gardner, R., (1992). Covenants with and 
without a sword: self-governance is possible. American Political 
Science Review 86(2), 404-417. 

Press, W, H. and Dyson, F. J. (2012). Iterated Prisoner’s Dilemma 
contains strategies that dominate any evolutionary opponent. Proc. 
Nat. Acad. Sci. 109(26), 10409-10413. 

Scorpio, R. (2000). Fundamentals of acids, bases, buffers and their 
application to biochemical systems. Kendall Hunt, Dubuque, IA. 

Smith, T.F. and Morowitz, H. J., (1982). Between history and physics. J. 
Mol. Evol. 18(4), 265-282. 

Stewart, A. J. and Plotkin, J. B., (2013). From extortion to generosity, 
evolution in the Iterated Prisoner’s Dilemma. Proc. Nat. Acad. Sci. 
110(38), 15348-15353. 

Watson, A. J. and Lovelock, J. E., (1983). Biological homeostasis of the 
global environment: the parable of Daisyworld. Tellus 35B:284-289. 


425 



How ecological inheritance can affect the evolution of complex niche construction 

in a 2D physical simulation 

Naoaki Chiba, Reiji Suzuki and Takaya Arita 

Graduate School of Information Science, Nagoya University 
Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan 
Email: chiba@alife.cs.is.nagoya-u.ac.jp 


Abstract 

Niche construction is a process in which organisms mod¬ 
ify the selection pressure on themselves and others through 
their ecological activities. Various evolutionary models of ef¬ 
fects of niche construction on evolution have revealed that 
they bring about unexpected evolutionary scenarios. How¬ 
ever, little is still known about how niche-constructing be¬ 
haviors of complex physical structures (such as nest-building) 
can emerge through the course of evolution, even though it is 
one of the most ubiquitous and significant niche-constructing 
behaviors. Our purpose is to obtain knowledge of the emer¬ 
gence and evolution of physically-grounded niche construc¬ 
tion and the effect of its ecological inheritance on evolution. 
We construct an evolutionary model in which a virtual organ¬ 
ism has to arrive at a goal by constructing a physical niche 
composed of objects in a physically simulated environment. 

In particular, we focus on effects of the degree of ecological 
inheritance, which is represented as a weathering probability 
of ecologically inherited objects from a parent to its offspring. 
We show that it has a nonlinear effect on the adaptivity of the 
population. In the case of no ecological inheritance, adaptive 
niche-constructing behaviors such as valley-filling or ramp¬ 
placing strategies emerged, which created complex structures 
composed of multiple objects. It also turned out that the sta¬ 
ble ecological inheritance of constructed structures could in¬ 
crease the adaptivity of the population by allowing an organ¬ 
ism to maintain the inherited and adaptive structures while the 
unstable ecological inheritance rather decreases the adaptiv¬ 
ity of the population by making previously adaptive structures 
maladaptive obstacles. 

Introduction 

All creatures, to a greater or lesser extent, change their 
own and others’ niches through their ecological activi¬ 
ties, which modify the selection pressure on themselves 
and others. This process is called “niche construction” 
(Odling-Smee et al., 2003). The niche construction pro¬ 
cesses are seen in ecological activities of many species such 
as plants (photosynthesis), nonhuman animals (nest build¬ 
ing) and human. Recently, the niche construction is also 
recognized as an important factor in considering an open- 
ended evolution (Taylor, 2015). 

A typical example of a niche-constructing organisms are 
earthworms that change the structure and chemistry of soils 


Niche construction 



Selection pressure 


Figure 1: A diagram of evolution, niche construction and 
ecological inheritance (based on Odling-Smee et al., 2003). 

through their burrowing behaviors. These changes are ac¬ 
cumulated over generations, and then bring about different 
environmental conditions, which expose the successive pop¬ 
ulation to different selection pressure. This effect is also 
called “ecological inheritance”, as it makes the generation 
inherit a legacy of modified selection pressures from ances¬ 
tral organisms (Odling-Smee et al., 2003). 

The effects of niche construction on evolution have 
been mainly investigated using both mathematical and 
simple computational models, in which effects of niche¬ 
constructing behaviors are represented as changes in vari¬ 
ables that represent the environmental states. The envi¬ 
ronmental state in Laland et al.’s model of population ge¬ 
netics (Laland et al., 1996) is represented as the amount 
of resource, which can be directly increased by the niche¬ 
constructing behavior. Han et al. extended a version of 
Laland et al.’s model to a patch occupancy model in which 
the amount of distributed resources can be modified through 
niche construction (Han et al., 2009). They showed that 
three different spatial patterns of metapopulation emerged 
depending on the different ecological imprint from niche 
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construction. Suzuki and Arita constructed an evolution¬ 
ary model into which both learning (change in a phenotypic 
value) and niche construction (change in an optimal phe¬ 
notypic value) were incorporated, and reported that a cyclic 
coevolution of genes for learning and niche construction was 
observed in the case of the low temporal locality of ecolog¬ 
ical processes (Suzuki and Arita, 2010). Harvey also con¬ 
structed a simplified version of the Daisyworld model in 
order to understand interactions between the environment 
states (i.e., the temperature of the planet) and organisms (i.e., 
black and white daisies that increase and decrease the tem¬ 
perature, respectively) (Harvey, 2004). He reported that the 
homeostasis of the global temperature emerged through the 
change of the proportion of black and white daisies against 
the global warming. These studies clarified various effects 
of niche construction on evolution in the cases when the 
environmental state is represented as a quantity (e.g., re¬ 
sources, optimal phenotypic values, temperature). 

On the other hand, an important feature of niche con¬ 
struction is that it can create physical and complex struc¬ 
tures composed of many components, which cannot be rep¬ 
resented quantitatively. A nest building is a typical and ubiq¬ 
uitous example of such a behavior. For example, a beaver 
makes a dam with branches: It stems the flow of a river and 
have an influence on many organisms (Odling-Smee et al., 
2003). Taylor presented an individual-based model of com¬ 
plex niche construction that can make the shape of one¬ 
dimensional fitness landscape complex (Taylor, 2004). He 
showed that the evolved organisms that performed more 
complex niche constructions had more genes, which im¬ 
plies a continuous increase in the complexity of organisms. 
Kojima et al. constructed an evolutionary model in which 
individual has a strategy for a Prisoner’s Dilemma and a 
trait for creating a physical structure that can limit social 
interactions between neighboring individuals (Kojima et al., 
2014). They found that, when the degree of ecological in¬ 
heritance was high, a stable pattern of the physical structure 
emerged. It enabled cooperators to reduce the number of in¬ 
teractions with defectors while keeping that number with co- 
operators moderate. Although these studies discussed com¬ 
plex or physical niche construction, they are still abstract in 
the sense that physically-grounded interactions between or¬ 
ganisms and environments were not considered. 

The purpose of this study is obtaining knowledge of the 
emergence and evolution of physically-grounded niche con¬ 
struction and the effect of its ecological inheritance on evo¬ 
lution. In particular, we focus on the effects of the degree 
of ecological inheritance, which could be different due to 
properties of niches and ecological dynamics. We discuss 
whether and how complex interaction processes between or¬ 
ganisms and environments can bring about non-trivial evolu¬ 
tionary dynamics depending on the different degree of eco¬ 
logical inheritance. 

We adopt a model of virtual organisms, which is rec¬ 



Figure 2: The field for fitness evaluation. 


ognized as a novel platform to discuss recent topics in 
evolutionary research such as eco-evolutionary feedbacks 
(Ito et al., in press). We construct an evolutionary model in 
which a virtual organism has to arrive at a goal by construct¬ 
ing a physical niche composed of some objects. We adopt a 
physically simulated environment based on a physics engine 
for 2D games. We show that the degree of ecological inheri¬ 
tance can have a nonlinear influence on the adaptivity of the 
population, facilitating or retarding the evolution of niche¬ 
constructing behaviors of complex and physically-grounded 
structures. 


Model 

Field and task 

We use the Box2D (Catto, 2016), which is an open source 
physics engine for 2D games, in order to introduce a phys¬ 
ically simulated environment into our model. Box2D can 
simulate physical interactions between 2D objects such as 
friction and collision between rigid bodies. 

In our model, we assume an x-y coordinate plane that rep¬ 
resents a horizontal and vertical space, and there exists grav¬ 
ity along the y-axis toward the bottom. The simulation is up¬ 
dated every infinitesimal time step dt (second). Hence, the 
physical environment is updated 1 /dt times in one second. 
We used the default parameters that define the properties of 
physical environment with a few modifications 1 . 

We assumed a 1160 x 360 virtual space as shown in Fig. 
2. A field consists of squared “field tiles” with a side length 
of 20. There is a special field tile on the right end of the field, 
representing a goal. There are two valleys composed of field 
tiles in the field, and the left one is shallow and wide whereas 
right one is deep and narrow. The virtual organism is put on 
a starting point on the left end of the field at first. The task 
for the organism is to move from the start to the goal as many 
times as possible within the time limit T seconds under the 
assumption that the organism is returned to the start after 
the goal. Specifically, the fitness of the virtual organism is 
calculated by the following equation (Eq. 1): 


fitness = g + 


disgoal — dis 
disgoal 


( 1 ) 


Gravity g = 9.8 (m/s 2 ), density p = 1.0 (kg/m 2 ), coeffi¬ 
cient of friction of a virtual organism pi — 0.7 and coefficient of 
friction except for a virtual organism p — 0.3. 
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where g is the number of times for which the virtual organ¬ 
ism arrived at the goal, disgoal is the distance between the 
start and the goal, and dis represents the distance between 
the goal and the position of the virtual organism at the end of 
fitness evaluation. Therefore, the more the number of times 
of arrival at the goal is and closer to the goal at the final step 
the virtual organism gets, the higher fitness it gets. We ex¬ 
pect that a non-niche-constructing organism will get stuck 
in the first valley because it cannot climb the valley while 
niche-constructing organisms have a possibility to reach the 
goal by placing some objects in the field. 

Virtual organism 

In our model, a circular-shaped organism with the radius of 
20 can move in the field by rotating its body to the left or 
right (Fig. 3). It also can place objects 2 in the field. This 
is a niche-constructing behavior in our model in the sense 
that constructed structures can affect the future adaptivity of 
the organism. There are two types of objects: “box” with a 
side length of 16 and “board” which is a 6 x 60 rectangle. A 
virtual organism has two areas around it: the visibility and 
the motion range of its arm (Fig. 3, right). The visibility is 
a round shaped region around the organism with the radius 
of FV and the organism can recognize objects and field tiles 
within this area. The motion range of its arm is also a round¬ 
shaped region with the radius of L and the organism can 
place objects within this area. Placed object will fall on other 
objects or field tiles due to the gravity if it is placed in the 
air. There is no cost for placing objects. 

A three-layer neural network, of which weights are de¬ 
fined by the genotypes of an organism, determines the be¬ 
havior of the organism (Fig. 3, left). We use a sigmoid func¬ 
tion as a transfer function in the neural network. The val¬ 
ues are inputted to the neural network every time when the 
physical environment is updated. The following values are 
inputted to the input layer: the number of field tiles, boxes 
and boards within the visibility; the relative x-y positions of 
their center of gravity from the organism; and the number of 
available objects, which will be explained later. 

The output layer consists of one neuron which decides a 
direction of rotation and five neurons related to placing ob¬ 
jects. The first neuron decides which direction the virtual 
organism rotates and moves. If its output value is higher 
than 0.5, the clockwise torque r of which the magnitude is 
100 , 000 (kgf • m) is applied to the virtual organism, other¬ 
wise anti-clockwise torque is applied to it. The second neu¬ 
ron decides whether the virtual organism places an object or 
not. If its value is larger than 0.5, the virtual organism places 
the object in the field. The third neuron decides which object 
the virtual organism places in the field if it does. If its value 
is higher than 0.5, the virtual organism places a box, other¬ 
wise it places a board. The fourth and fifth neurons decide 

2 In this paper, the term “objects” represents boxes and boards 
placed by an organism. It does not include field tiles. 



Figure 3: The neural network of a virtual organism. 


the position on which the virtual organism places the object 
within the motion range of its arm. The position is repre¬ 
sented by the polar coordinates (r, 0) = (L x £ 4 , 2tt x x 5 ), 
where £4 and x 5 represent the fourth and the fifth output 
values, respectively. The last neuron decides the angle of 
rotation of the object <fi = 27 t x xq, where xq represents the 
sixth output value. If the focal object will interfere the ex¬ 
isting field tiles or objects in the field, or will be outside of 
field, the action of placing the object is canceled and nothing 
happens. 

Moreover, the parameter B determines the maximum 
number of the objects that can exist in the field. It reflects the 
amount of available resources for niche-constructing behav¬ 
iors. The organism can make use of the number of available 
objects for decision making as an input to its neural network. 

Evolution and ecological inheritance 

A virtual organism has synaptic weights of its neural net¬ 
work of which values are determined by its own genes. The 
population of organisms evolves according to a genetic al¬ 
gorithm. In the initial generation, there are N virtual or¬ 
ganisms and the values of their genes are randomly assigned 
between -1.0 and 1.0. After the fitness evaluation of all or¬ 
ganisms, a pair of parents is selected by a roulette-wheel 
selection in accordance with the fitness. They produce a 
pair of two offspring with the same genotypes as them¬ 
selves, and a two-point crossover occurs between the off¬ 
spring with a probability PC. Each gene can mutate with a 
small probability PM. If a mutation occurs, a random num¬ 
ber G [—DM, DM] is added to a value of the gene. This 
process will be conducted until the number of the offspring 
reaches N. 

Furthermore, we introduce an ecological inheritance into 
the model in order to investigate its effect on the evolution 
of niche construction. In a pair of the offspring, the environ¬ 
mental state of one parent is inherited to the environment of 
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Figure 4: The fitness and the weathering probability we. 



Figure 5: A valley-filling Figure 6 : A ramp-placing 

strategy. strategy. 



Figure 7: Obstacles prevent¬ 
ing an organism from crossing 
a valley. 


one offspring, and the environmental state of the other parent 
is also inherited to that of the other offspring. Specifically, 
each offspring inherits the environmental state of the corre¬ 
sponding parent at the end of its fitness evaluation process. 
This means that the all objects in the parent’s environment 
will be copied to the offspring’s environment, keeping their 
types, positions and rotations the same. 

In addition, the degree of such ecological inheritance 
can vary depending on environmental conditions in the real 
world. Thus, we also introduce a probability we into our 
model, which represents a probability of weathering of each 
object. Each inherited object vanishes according to the prob¬ 
ability we. The higher the probability we is, the less the 
virtual organism inherits the objects. We conduct the whole 
process of evolution and ecological inheritance through G 
generations. 

Result 

We conducted evolutionary experiments using the following 
parameters: N = 50; G = 2,000; dt = 0.02; T = 200; 
B = 25,40 and 55; FV = 500; L = 125,250 and 500; 
we = 10- 2 0 , lO- 15 ,10- 10 , lO - 0 5 , and lO ” 0 0 ; PC = 
0.7; PM = 0.001 and DM = 0.003. We conducted 10 
trials for each combination of the parameters B , L and we. 
This is because B and L are related to the richness of the 
environment and the basic ability of organisms, respectively. 

Fig. 4 shows the average fitness over the all (3 x 3 x 10 ) 
trials for each case of we. A horizontal axis represents we 
and the red square represents the average fitness. We used 
the fitness values of the last 1000 generations for calculating 
the average fitness in each trial to eliminate effects of ini¬ 


tial conditions. We also showed a box plot of each set of 90 
trials. We see that the weathering probability we strongly 
affected the average fitness. There was a statistically signif¬ 
icant difference in the fitness distribution among these cases 
(Kruskal-Wallis test, H = 81.9, p — value < 1 x 10 -3 ). 

Here, we particularly focus on the three typical cases 
of we: 0.01,0.1 and 1.0. In the case of the probability 
we = 1 . 0 , there is no effect of the ecological inheritance be¬ 
cause all objects vanish when they are handed over to next 
generation. On the other hand, the ecological inheritance 
occurs stably when the probability we is 0.01. When the 
probability we is 0 . 1 , the ecological inheritance occurs but 
it is unstable. 

In the case of we = 1.0, the average fitness was 1.8. It 
means that a virtual organism arrived at the goal one or two 
times on average in one trial. On the other hand, the aver¬ 
age fitness was about 2.1 when we was 0.01. It indicates 
that the ecological inheritance of most of the objects from 
the parental generation contributed to the fitness increase. 
Comparing with this result, the average fitness was about 
0.6 in the case of we = 0 . 1 , which also indicates that un¬ 
stable inheritance of objects can rather decrease the fitness. 
This implies that the degree of ecological inheritance has a 
nonlinear effect on the adaptivity of the population. 

The high fitness values when we = 0.01 and 1.0 were ob¬ 
tained by the evolution of adaptive niche-constructing strate¬ 
gies. Fig. 5 and Fig. 6 show two typical strategies, which 
were commonly observed in successful trials irrespective of 
the parameter settings (except for we = 0.1). One is a 
valley-filling strategy that fills a valley with many objects 
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Figure 8: A typical example of evolution process when 
we = 1.0, L = 500 and B = 25. 


(Fig. 5), which allows an organism to pass through the val¬ 
ley. The other is a ramp-placing strategy that creates a ramp 
of a board, which allows an organism to climb from the bot¬ 
tom of a valley. The former strategy was observed more 
often than the latter, which implies that the former was more 
easily acquired through the evolution process due to its sim¬ 
plicity and robustness against external perturbations such as 
collisions with organisms or other objects, compared with 
the latter. Fig. 7 shows an example of objects that worked 
as obstacles, preventing an organism from crossing a val¬ 
ley. We discuss how and why the degree of ecological in¬ 
heritance affected the evolution of such niche-constructing 
behaviors in detail. 

Experiments with no ecological inheritance 

(we 1.0) 

First, we analyze experimental results with the weathering 
probability we = 1.0 as a basic situation in which there is 
no effect of ecological inheritance. Fig. 8 shows an example 
evolution process of the fitness (top) and the average number 
of objects that existed at the end of fitness evaluation in each 
generation (bottom) in a trial when L = 500, B = 25 and 
we = 1.0, as a case in which an adaptive structure evolved 
successfully. While the average fitness was less than 1.0 in 
the initial generation, it increased to about 3.5 quickly. Then 
it further increased to about 4.0 at around the 950th gener¬ 
ation. There were organisms with very low fitness through 
experiments. This is because offspring of organisms with 
higher fitness sometimes cannot reach a goal at all due to 
the negative effects of genetic operations. The average num- 



Figure 9: An emerged structure at the 900th (top) and the 
2000th (bottom) generations in the trial in Fig. 8. 


ber of the placed boards was around 24, which is close to the 
number of available objects B in this case, whereas that of 
placed boxes was 0. Fig. 9 shows an emerged structure at 
the 900th (top) and the 2,000th (bottom) generations in the 
trial in Fig. 8. We see that the organism could pass through 
valleys by using both valley-filling and ramp-placing strate¬ 
gies with many boards. We also see that the slight changes in 
the distribution of boards contributed to the fitness increase 
as mentioned above. 

So as to analyze the general tendency of emerged niche¬ 
constructing behaviors and their adaptivity, we focus on the 
average fitness and the average number of the two types of 
placed objects in various experimental conditions of B and 
L , as shown in Fig. 10. Each point represents the average 
fitness (color) and the average number of placed boxes (x- 
axis) and boards (y-axis) during a fitness evaluation process 
over the last 1000 generations in a single trial. Each subfig¬ 
ure corresponds to the setting of B and each type of marker 
corresponds to the setting of L. We see that the trials in 
which many objects were placed tended to have the higher 
fitness, as the example above showed. This indicates that 
the niche construction, that is, placing many objects, con¬ 
tributed to the adaptivity of the evolved organisms in our 
model. We also see the virtual organism never arrived at 
the goal in some trials and thus got a low fitness. In this 
case, the organisms tended to evolve non-niche-constructing 
strategies, which do not place any objects at all, because it 
is better not to place any obstacles in order to get closer to 
the goal if objects do not contribute to pass through valleys. 
This strategy is expected to be a sub-optimal in the sense 
that once such a strategy occupied the population, adaptive 
niche-constructing strategies rarely evolved. 

Next, we focus on the types of the placed objects in the 
field. It is seen from Fig. 10 that the number of the placed 
boards was larger than that of placed boxes especially when 
the fitness was high. The difference in the characteristic of 
boxes and boards appears to be the cause of this. A box is 
taller than a board. Thus, it is beneficial to use for filling 
in valleys. However, at the same time, it can become an 
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Figure 10: The fitness and the number of the objects that existed at the end of fitness evaluation when we = 1.0. 
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Figure 11: The fitness and the number of the objects that existed at the end of fitness evaluation when we = 0.01. 


obstacle if it exists in front of an organism. On the other 
hand, a board is flatter than a box and a virtual organism 
can climb over it, thus, it does not cause such a problem, 
which allows organisms to obtain the higher fitness when 
they use boards. Furthermore, it is long enough to create a 
ramp structure. It is expected that boards were used more 
frequently than boxes because of these differences in their 
characteristic. In addition, we also see that the fitness of the 
trials with B = 25 was higher than one with B = 40 and 
50 in general. This might be because objects tend to become 
obstacles if too many objects are placed in the field. 

Experiments with stable ecological inheritance 

(we = 0.01) 

Next, we discuss how the evolution of such an adaptive niche 
construction is affected if constructed structures are inher¬ 
ited to the next generation. Fig. 11 shows the relation be¬ 
tween the fitness and the number of the objects that existed at 
the end of fitness evaluation in the case of stable ecological 


inheritance: we = 0.01. The overall trend did not change 
compared with Fig. 10, meaning that adaptive organisms 
tended to have many objects in their field. However, in this 
case, they inherited most of the objects due to the very low 
weathering rate, and they tended to add a few boards dur¬ 
ing their fitness evaluation process. The fitness tended to be 
higher than the cases with we = 1.0, especially when B was 
large (40 and 55). 

Fig. 12 shows a typical example of an evolution process 
in the case of L = 125 and B = 40 in which the adaptive 
structure evolved successfully. The top, middle and bottom 
panels represent the fitness, the average number of the in¬ 
herited objects from the previous generation and the average 
number of the placed objects by an organism at the current 
generation, respectively. Except for the initial few genera¬ 
tions, organisms evolved to place a few boards at each gen¬ 
eration, which resulted in the accumulation of many boards 
in the field. At the last generation, on average, 38.1 boards 
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Figure 12: A typical example of an adaptive evolution pro¬ 
cess in the case of we = 0.01, L = 125 and B = 40. 


and 1.5 boxes were inherited from the previous generation, 
while 0.3 boards and 0.02 boxes were placed in the field. 
This means that organisms inherited the nearly maximum 
number of boards from their parents, and they compensated 
for the vanished ones by placing a few additional boards. 
The average fitness reached around 5.0 and this was higher 
than that of any cases with the probability we = 1.0. 

Fig. 13 shows snapshots of the inherited environment in 
the same trial in Fig. 12. The top, middle and bottom pan¬ 
els represent the inherited environments at the 300th, 400th 
and 500th generation, respectively. We see that two valley¬ 
filling structures of boards were inherited. These structures 
allowed an organism to pass through the valleys and this is 
a main reason that this organism obtained the high fitness 
because it does not need to create such adaptive structures 
from scratch. We also see that there were a few changes in 
these structures across generations. This is due to the weath¬ 
ering of a few objects, and the organism maintained adaptive 
structures by placing additional boards as many as possible 
in the field. The reason the fitness tended be high especially 
when B was large is expected to be due to the fact that well- 
organized structures with many objects through ecological 
inheritance were more adaptive (e.g., easy to pass through, 
robust against the weathering of objects). Therefore, in the 
case of stable ecological inheritance, organisms evolved to 
maintain an inherited adaptive structure composed of many 
objects. 



m 


Figure 13: A typical example of an inherited environment at 
the 300th (top), 400th (middle) and 500th (bottom) genera¬ 
tions in the case of we = 0.01, L = 125 and B = 40. 

Experiments with unstable ecological inheritance 

(we = 0.1) 

In this condition, in which 10% of the objects in the previous 
generation disappear, the fitness was very small in many tri¬ 
als, and the non-niche-constructing strategy evolved in such 
cases. A cause of it is expected to be large changes in the 
environmental conditions between generations. Even when 
adaptive niche-constructing strategies appear and begin to 
invade into the population, the emerged adaptive structures 
in the current generation tend to become obstacles (such as 
shown in Fig. 7) in the subsequent generations because of 
their irregular shapes due to the high weathering rate. This 
prevents such adaptive strategies to invade into the popula¬ 
tion, and further allows non-niche-constructing strategies to 
evolve. 

Fig. 14 shows a typical but a bit complex example of such 
a situation when L = 125, B = 40 and we = 0.1. The fit¬ 
ness was high around the initial few generations, meaning 
that an adaptive niche-constructing strategy evolved. How¬ 
ever, the fitness decreased drastically as soon as the organ¬ 
ism began not to place objects, and it never increased until 
the last generation. 

Once the adaptive structures emerged and inherited to the 
next generation, placing more objects might not contribute 
to the fitness increase or even have a negative effect on the 
fitness increase. In such a case, there can be no or nega¬ 
tive selection pressure to place objects. However, if non¬ 
niche-constructing strategies evolved, the adaptive structure 
became obstacles very quickly in the case of the high weath¬ 
ering rate, which resulted in the rapid fitness decrease, as 
observed in Fig. 14. 

In sum, the unstable ecological inheritance has a negative 
effect on the evolution of an adaptive niche-constructing be- 
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Figure 14: A typical example of an evolution process in the 
case of we = 0.1, L = 125 and B = 40. 


havior by collapsing emerged adaptive niches. 


niche-constructing behavior of organisms. They focused on 
the number of previous generations of niche construction in¬ 
fluencing the amount of resource in the current generation, 
and showed that the increase in the number of previous gen¬ 
eration have a simple and monotonous effect on the evolu¬ 
tion process, yielding the more considerable time-lag. In 
contrast, our result implies that such environmental parame¬ 
ters can have more complex effects on the evolution process 
when there are more complex interactions between organ¬ 
isms and environments. 

Future work includes introducing other types of objects 
and the evolution of object shapes into our model, and con¬ 
ducting evolutionary experiments with different setting of 
the field. 
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Abstract 

Conservation ecologists have long argued over the best way 
of placing reserves across an environment to maximize pop¬ 
ulation diversity. Many have studied the effect of protecting 
many small regions of an ecosystem vs. a single large region, 
with varied results. However, this research tends to ignore 
evolutionary dynamics under the rationale that the spatiotem- 
poral scale required is prohibitive. We used the Avida digi¬ 
tal evolution research platform to overcome this barrier and 
study the response of phenotypic diversity to eight different 
reserve placement configurations. The capacity for mutation, 
and therefore evolution, substantially altered the dynamics of 
diversity in the population. When mutations were allowed, 
reserve configurations involving a greater number of conse¬ 
quently smaller reserves were substantially more effective at 
maintaining existing diversity and generating new diversity. 
However, when mutations were disallowed, reserve config¬ 
uration had little effect on diversity generation and mainte¬ 
nance. While further research is necessary before translating 
these results into policy decisions, this study demonstrates the 
importance of considering evolution when making such deci¬ 
sions and suggests that a larger number of smaller reserves 
may have evolutionary benefits. 

Introduction 

Protecting biodiversity is generally acknowledged to be an 
important conservation goal for a number of reasons, in¬ 
cluding biodiversity’s role in maintaining various ecosys¬ 
tem services (e.g. carbon sequestration), and its potential as 
a reservoir of useful and undiscovered genetic innovations 
(Gaston and Spicer, 2004; Hassan et al., 2005; Loreau et al., 
2001; Montoya et al., 2012). However, there is another rea¬ 
son that biodiversity is critically important, which is often 
overlooked - continued evolution requires diversity. Since 
adaptation to new environments will be a critical compo¬ 
nent of the long-term survival of many lineages in the face 
of climate change, it is important to consider conservation 
of biodiversity in the context of evolution (Stockwell et al., 
2003; Mace and Purvis, 2008; Smith et al., 2014). 

Most conservation biology research requires a broad spa¬ 
tial scale. Incorporating the long temporal scale required 
to study evolution makes this already challenging problem 
intractable in most cases. As a result, most attempts to 


factor evolution into conservation planning decisions have, 
out of necessity, been based on general evolutionary princi¬ 
ples rather than empirical analysis of their likely outcomes 
(Cowling and Pressey, 2001; Sgro et al., 2011; Ferrire et al., 

2004) . In particular, little research on conservation schemes 
to date has taken evolution into account. Artificial Life tech¬ 
niques such as digital evolution have a lot of potential as 
an approach to overcoming these obstacles; they allow for 
the formation of interesting ecologies in a system with a fast 
enough generation time to do large scale evolution experi¬ 
ments. 

Here, we use digital evolution to revisit the single- 
large vs. several small (SLOSS) debate from a perspec¬ 
tive that incorporates evolutionary theory. The SLOSS 
debates emerged from the theory of island biogeography 
(MacArthur and Wilson, 1967). The original argument 
was that, since large islands have more species, larger re¬ 
serves should be better for conserving biodiversity (Dia¬ 
mond, 1975). However, this effect is counterbalanced by 
the fact that placing more reserves might result in sampling 
from multiple different species pools (Simberloff and Abele, 
1976). More recent refinements have considered the place¬ 
ment of the reserves relative to each other and interconnec¬ 
tivity between them (Saunders et al., 1991; Tjorve, 2010). 

Evolutionary dynamics likely add additional weight to the 
argument for several small reserves for a number of rea¬ 
sons. First, transient fitness gains can result in a single lin¬ 
eage sweeping a reserve relatively quickly and wiping out 
standing diversity. Second, separating reserves decreases the 
colonization rate, giving other lineages time to gain benefi¬ 
cial mutations of their own (Whitley et al., 1998; Tomassini, 

2005) . Third, spatial isolation can increase the likelihood of 
speciation. 

Many factors interact to bring about the complex spa¬ 
tial eco-evolutionary dynamics that we observe in biological 
ecosystems. Indeed, the interactions between various fac¬ 
tors are a large part of the reason that the relative benefits 
of different reserve placement strategies have been so hard 
to untangle. Here, we seek only to lay the groundwork for 
addressing the impact of evolution on these questions. In 
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order to facilitate this, we will deal with the simplest pos¬ 
sible case: a population of sessile, asexual organisms at the 
same trophic level. Movement, sexual recombination, and 
predation likely have dramatic impacts on the resulting dy¬ 
namics. However, in order to understand these effects, we 
must first understand the behavior of a system without them. 
Additionally, for the purposes of this paper, we assume an 
entirely homogeneous environment, eliminating the possi¬ 
bility for complex interactions among habitat heterogeneity, 
species diversity, and reserve area (Kadmon and Allouche, 
2007). 

Methods 

Study System 

We conducted our experiments in silico , using the Avida 
Digital Evolution Platform version 2.12.4 (available at 
https://www.github.com/devosoft/avida) (Ofria and Wilke, 
2004). Configuration files for this paper are available 
at https://github.com/emilydolson/conservationExperiment. 
The world of Avida is a two-dimensional grid of cells oc¬ 
cupied by digital organisms. These organisms are actually 
computer programs; their genomes are sequences of simple 
computer instructions. At the beginning of the experiment, 
we seed the world with a single ancestor that contains the in¬ 
structions necessary to copy itself. As organisms copy them¬ 
selves, they periodically make mistakes, introducing muta¬ 
tions. Some of these mutations will improve the efficiency 
of self-replication, so the organisms that have them will copy 
themselves faster than the others. If there is no space avail¬ 
able for an organism’s offspring, the offspring will replace 
an existing organism. As a result, there is selection for or¬ 
ganisms that can replicate themselves faster. Because there 
are mutation, inheritance, and selection, evolution by natural 
selection occurs. 

To allow for the formation of more complex ecologies, 
we can also choose to reward organisms for performing var¬ 
ious computational tasks by allowing them to execute their 
genomes faster. These tasks can be thought of as pathways 
for metabolizing various resources, and allow for different 
organisms to specialize on different survival strategies. To 
allow for the formation of a stable ecosystem, we can estab¬ 
lish negative frequency dependence by linking each task to 
a limited resource, such that organisms are rewarded for a 
task in proportion to the amount of the relevant resource that 
they have access to (Chow et al., 2004). 

Experimental Set-up 

For this study, we started by evolving ten populations in the 
limited resource environment described above (specifically, 
we used the same configuration settings as in (Walker and 
Ofria, 2012) except where otherwise noted). Each popula¬ 
tion was started from the same hand-coded self-replicator, 
but was then allowed to diverge for 100,000 updates, a 
length of time roughly equivalent to 2000 generations. We 


Reserve Configurations 

see Figure 1 

Kill Rates 

100, 200, 300, 400, 500 

Mutations Allowed? 

Yes, No 

Initial Populations 

1, 2, 3, 4, 5, 6, 7, 8, 9, 10 


Table 1: Parameter values. These values were combined in 
a full-factorial design, with 10 replicates per condition, for 
a total of 8000 runs of Avida. Kill rate indicates a number 
of grid cells that were randomly selected from the environ¬ 
ment each update. If these cells were not in a reserve and 
contained an organism, that organism was killed 

then placed these populations in one of eight environments 
for 100,000 more updates. Each environment had reserves 
placed across it in a different configuration. All environ¬ 
ments had a total of 900 out of the 3600 grid cells placed 
in square reserves that tiled evenly across the environment. 
Reserve configurations varied from having many very small 
reserves to having a single very large reserve (see Figure 
1). Because world size was held constant, configurations 
with more reserves necessarily involved them being placed 
closer together (there was always one reserve worth of space 
between reserves on all sides). Organisms living in areas 
outside of the reserves were at risk of being randomly killed 
each update (see Table 1). Offspring were placed probabilis¬ 
tically near their parents, according to a Poisson distribution, 
to create a spatial population structure roughly analogous to 
tree seed dispersal. In order to ascertain what effect allow¬ 
ing populations to evolve was having on our results, we also 
ran a series of controls in which mutations were disallowed 
for the second 100,000 updates. We ran 10 replicates per 
treatment in a fully factorial design across initial population, 
environment, five kill rates, and mutations being allowed vs. 
disallowed. 

Analysis 

There are three possible mechanisms by which our reserve 
schemes could drive changes in diversity over evolutionary 
time - a reserve configuration might: 1) sample from a dif¬ 
ferent range of locations across the environment, 2) promote 
improved maintenance of existing diversity, and/or 3) pro¬ 
mote improved generation of new diversity. To address pos¬ 
sible mechanism (1), we measured the number of pheno¬ 
types in any reserve at the beginning of the experiment for 
each condition. To address possible mechanisms (2) and (3), 
we collected a variety of data on the phenotype-area rela¬ 
tionships in our data: phenotype richness within each re¬ 
serve, total phenotype richness captured across all reserves 
in a replicate, count of phenotypes lost over the experimen¬ 
tal treatment, and count of new phenotypes that evolved over 
the course of the experimental treatment. 

All analyses were conducted using the R Statistical Com¬ 
puting Language, version 3.2.3 (Team, 2013). For statistics 
in which the unit of replication was a single run of Avida 
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Figure 1: Reserve configurations. Black cells are part of reserves, while white cells are unprotected. Note that the world is 
toroidal, so spacing between all reserves within a condition is equivalent. 


(phenotype loss and generation of novelty), we used a 2- 
way ANOVA in which initial population, reserve size, and 
their interaction were treated as random effects. Effect sizes 
were calculated as partial eta squared (Lakens, 2013). We 
also calculated some statistics (alpha richness) for which the 
unit of replication was individual reserves. To account for 
the non-independence that this set-up introduced, we used 
linear mixed models with random effects for initial popula¬ 
tion and replicate, as implemented in the lme4 R package 
(Bates et al., 2015). 

Results and Discussion 

Overall, the replicates in which mutations were allowed dur¬ 
ing the second 100,000 updates had substantially higher di¬ 
versity (both richness and Shannon entropy) at the end of the 
experiment than replicates for which mutations were disal¬ 
lowed (see Figure 3). Runs in which mutations were dis¬ 
allowed had an average of 192.009 +- 1.345 fewer pheno¬ 
types remaining at the end than runs in which evolution was 
allowed to continue (Linear mixed model, Chi-squared = 
10123, p <.0001). Most ecological models do not include 
this drop-off (MacArthur and Wilson, 1967; Kadmon and 
Allouche, 2007; Tjorve, 2010). This discrepancy is partially 
because ecological models of reserve placement are based 
on models of island biogeography, and so do not include a 


phase prior to reserve placement. However, this is a superfi¬ 
cial distinction. The more fundamental explanation is likely 
that most ecological models are built on the assumption of 
some sort of competitive equivalency between phenotypes, 
usually based off of the idea that all extant phenotypes are 
well-optimized to their environment. We make no such as¬ 
sumption. Instead, diversity in these experiments is stabi¬ 
lized through negative frequency dependence (due to lim¬ 
ited resources, discussed above) and co-evolutionary arms 
races, resulting in a dynamic near-equilibrium. While these 
mechanisms are more realistic, they generally mean that ad¬ 
vantages that one lineage has over another are unlikely to 
persist in the long-term. At any arbitrary point in time, there 
is probably a lineage with a slight advantage over other lin¬ 
eages. Removing mutations will eliminate the generation of 
novelty and leave this lineage at an advantage for the rest 
of the experiment. A related factor is the fact that diversity 
can only ever decrease if mutations are disallowed; a simple 
random walk under these conditions would also show some 
decrease in diversity. 

Allowing ongoing mutations dramatically increased the 
extent to which a greater number of smaller reserves pro¬ 
mote higher final phenotypic richness than configurations 
with a smaller number of larger reserves (see Figure 3). For 
the runs in which mutations were allowed during the sec- 
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Figure 2: Richness captured within and across reserves at the start of the experiment. Red boxes show the number of phenotypes 
captured within each reserve across all runs. Blue boxes show the total number of phenotypes captured across all reserves. Note 
that, despite the clear positive relationship between the size of a reserve and its phenotypic richness, the highest total richness 
across reserves is achieved in configurations with small reserves. Both axes are on a log scale and numeric labels indicate 
means of each box. 


ond half, increasing the log of the number of reserves by 
one increased the log of the final number of phenotypes 
by approximately .0956 +- .00259 (Linear mixed model, 
Chi-squared = 1172.6, p <.0001). This effect was substan¬ 
tially weaker among runs where mutations were disallowed, 
although still significantly different from 0 (Linear mixed 
model, Chi-squared = 25.298, p <.0001, slope=0.0084 
+- 0.0017 log(phenotype count) per additional log(reserve 
count) units). This appears to be the result of a combination 
of the three mechanisms described in the Methods section. 

More/smaller reserves capture more phenotypes 

Configurations with a greater number of consequently 
smaller reserves captured a greater number of phenotypes 
in reserves, likely due to the substantial clumping of closely 
related organisms (high spatial autocorrelation). In the ten 


initial populations, the relationship between phenotype rich¬ 
ness within a reserve and the size of that reserve followed 
the pattern of a standard species-area relationship (see Fig¬ 
ure 2), with a positive linear relationship between the log¬ 
arithms of reserve size and reserve richness (Connor and 
McCoy, 1979). Despite this positive relationship, the to¬ 
tal phenotypic richness summed across all reserves within 
an initial environment was negatively correlated with the 
area of each of those reserves. This negative relationship 
strengthened when populations were allowed to evolve for 
100,000 updates in the reserve design, while the slope of the 
positive relationship between within-reserve richness and 
reserve area decreased slightly (see Figure 3). These ef¬ 
fects both weakened dramatically when mutations were dis¬ 
allowed, but remained significantly different from zero. This 
ability for many small reserves to sample across multiple 
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Figure 3: Richness within and across reserves at the end of the experiment for both conditions. Note that the negative relation¬ 
ship between reserve area and total captured richness has intensified since the beginning of the experiment for runs in which 
mutations are allowed. Both axes are on a log scale and numeric labels indicate means of each box. 


species pools is precisely the scenario that Simberloff and 
Abele initially brought up as a counterexample to the ar¬ 
gument that a smaller number of larger reserves was al¬ 
ways preferable (Simberloff and Abele, 1976). Because or¬ 
ganisms disperse locally, similar phenotypes are likely to 
be clumped together in space, effectively creating multiple 
species pools. 

More/smaller reserves promote diversity 
maintenance 

We measured the number of phenotypes that were present in 
the initial population but no longer present in the final pop¬ 
ulation, i.e. lost phenotypes (see Figure 4). When mutations 
were allowed, there was a strong positive relationship be¬ 
tween phenotype loss and reserve sizes - environments with 
larger reserves resulted in greater phenotype extinction by 
the end of the experiment (two-way ANOVA, F( 1,3980) = 
1680.26, p <.0001, 7/p=.296). Among the runs where mu¬ 
tations were disallowed, however, there was no significant 
effect of reserve configuration on phenotype loss (two-way 
ANOVA, F(l, 3980) = 1.17, p = .41). This discrepancy is 


an example of the vast impact that allowing for evolutionary 
dynamics can have. 

There are a number of potential evolutionary drivers be¬ 
hind this effect, mostly related to the dynamics of selective 
sweeps in the population. In reproducing populations, some 
individuals will have more offspring than others. When one 
genotype has a substantial selective advantage over its com¬ 
petitors, selection will favor a rapid increase in that geno¬ 
type’s relative frequency, driving competitors in the popu¬ 
lation toward extinction. Such a process is referred to as a 
selective sweep, as selection sweeps less-fit variants out of 
the population (McVean, 2007). Selective sweeps typically 
dramatically reduce diversity within the affected population. 
Not only will the region of the genome under selection go to 
fixation in the population, but other variant sites on the ge¬ 
netic background where the beneficial trait first arose may 
also fix. Selective sweeps may be incomplete if, for exam¬ 
ple, a lineage encounters a competitor that is too similar in 
fitness. An incomplete selective sweep may also occur if a 
competitor exhibits negative frequency dependence and thus 
reaches an equilibrium. Even incomplete selective sweeps 
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Figure 4: Diversity maintenance and generation across conditions. Red boxes show the number of phenotypes that were initially 
captured in reserves but were not present at the end of the experiment. Blue boxes show the number of phenotypes that were 
not initially captured in reserves but were present at the end of the experiment. Note the positive correlation between phenotype 
loss and reserve size when mutations are allowed. Both axes are on a log scale and numeric labels indicate means of each box. 


can substantially reduce diversity, by making a large fraction 
of the population identical, and driving competing variants 
extinct (Biswas and Akey, 2006). 

Selective sweeps occur faster in populations with higher 
strength of selection and spatial connectivity (Cantu-Paz, 
2001). Both of these factors are impacted by reserve place¬ 
ment; large unprotected regions that fall between reserves 
decrease connectivity between those reserves (see Figure 5), 
effectively breaking them into multiple subpopulations and 
thereby decreasing strength of selection (Gavrilets and Vose, 
2005). In this experiment, inter-reserve connectivity and re¬ 
serve size are inversely correlated because we are holding 
the size of the entire world constant. As a result, it is hard 
to determine the impact of each of these variables on the 
dynamics of selective sweeps in this experiment. Based on 
the empirical results, however, the effect of reduced subpop¬ 
ulation size would appear to be stronger than the effect of 
increased connectivity. 

The initial population had a significant effect on diversity 
maintenance, both when mutations were allowed (two-way 


ANOVA, F(9, 3980) = 1602.45, p <.0001) and disallowed 
(two-way ANOVA, F(9, 3980) = 204019.41, p <.0001). 
When mutations were allowed, there was also a significant 
interaction between reserve configuration and initial popula¬ 
tion (two-way ANOVA, F(9, 3980) = 7.99, p <.0001). The 
interaction term is likely significant because some reserve 
configurations happen to capture more of the initial diver¬ 
sity of any given initial population than others. There may 
also be an effect of some initial populations having higher 
starting diversity, stability against selective sweeps, or evo¬ 
lutionary potential than others. 

More/smaller reserves promote diversity 
generation 

We measured the number of phenotypes that were not 
present in the initial population but were present in the final 
population, i.e. newly evolved phenotypes (see Figure 4). 
Among the runs where mutations were allowed, the count of 
newly evolved phenotypes had a negative relationship with 
reserve size - environments with larger reserves resulted in 
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Figure 5: Reserve connectivity increases as patch size de¬ 
creases. Bars show percentage of offspring from an arbitrary 
focal reserve that end up in the same reserve (blue), a differ¬ 
ent reserve (green), and no reserve (red). Note that reproduc¬ 
tion events where a parent from one reserve has offspring in 
another reserve are incredibly rare in all reserve configura¬ 
tions except the two in which reserves are smallest. 

fewer newly evolved phenotypes in total (two-way ANOVA, 
F(l, 3980) = 278.47, p <.0001, 77 ^= 065). This relation¬ 
ship suggests that smaller reserves do a better job of gener¬ 
ating new diversity. While the effect is much weaker than 
the effect of smaller reserves on diversity maintenance, it 
is still substantial (Lakens, 2013). As with diversity main¬ 
tenance, it is likely that this effect is driven by dynamics 
related to selective sweeps. When organisms are competing 
against fewer other organisms for space, as is the case in a 
smaller reserve (but see Figure 5), the strength of selection is 
weakened. This is generally believed to allow time for evo¬ 
lutionary innovation and increased diversification into new 
niches. A configuration with many small reserves can be 
thought of as roughly equivalent to an evolutionary algo¬ 
rithm that maintains multiple sub-populations and allows oc¬ 
casional migration between them. Such algorithms are gen¬ 
erally quite efficient, perhaps due to their improved ability 
to maintain and generate diversity (Tomassini, 2005). 

Conclusions 

We have laid groundwork for an integration of evolutionary 
dynamics into reserve design. Our results suggest that evolu¬ 
tion fundamentally changes the way that reserve placement 
affects diversity. In the presence of evolution, configurations 
with a greater number of consequently smaller reserves sub¬ 
stantially improved diversity maintenance. However, in the 
absence of evolution, there was no effect of reserve configu¬ 
ration on maintenance of the captured initial diversity. Con¬ 
figurations with a greater number of smaller reserves also 
promoted the evolution of a larger number of novel phe¬ 
notypes, a dynamic that is only possible when mutations 
are allowed. These results extend prior research on island 


model genetic algorithms to address a multi-niche ecosys¬ 
tem, which provides a better analog to biological ecosystems 
where conservation applications are relevant. 

While our results have potentially important implications 
for conservation management decisions, it is important to 
recognize that a number of our simplifying assumptions will 
bias our results against a smaller number of larger reserves. 
None of the organisms in our experiment are, for instance, 
larger or at a higher trophic level than any other organisms. 
In nature, some organisms require vastly more space than 
others in order to have access to sufficient energy sources. 
Additionally, as all of our organisms are sessile, they are not 
at risk of wandering out of small reserves, as mobile organ¬ 
isms would be. Similarly, as the organisms considered here 
are asexual, factors such as inbreeding depression and Allee 
effects are not accounted for. This research represents the 
first step in an understanding of how evolution interacts with 
reserve placement. Other topics for future research include 
the effect of corridors, distance between reserves, placement 
of reserves in relation to spatial resources, interactions with 
motile organisms, and the impact of sexual recombination 
and gene flow on diversity in these systems. 
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Abstract 

A novel agent based, artificial life model, for the evolution 
of mimicry is presented. This model is a predator-prey co¬ 
evolution scenario where pattern representation phenotype is 
simulated with Cellular Automata (CA), while behaviors of 
pattern recognition is configured with Hopfield Network. A 
visual three dimensional toroidal cube is used to construct a 
universe in which agents have complete freedom of mobil¬ 
ity, genetic representation of behavior and reproduction ca¬ 
pability to evolve new behaviors in successive generations. 
These agents are classified into categories of predator and 
prey species. Genome of prey species control their mobility 
and palatability, while 2D CA is used to represent a pattern, 
where the rule to generate the CA is also genetically repre¬ 
sented. Through evolution, successive generations of prey 
species develop new patterns to represent them both visually 
and to the predators. Predators are agents with the primary 
purpose of providing selection pressure for the evolution of 
mimicry. They are equipped with Hopfield Network memory 
to recognize new CA pattern and make intelligent decisions 
to consume the prey based on their level of palatability. Using 
the above construction of ideas, successful emulation of the 
natural process of mimicry is achieved. Also complex behav¬ 
ior pattern of Batesian and Mullerian mimicry is simulated 
and studied. 

Introduction 

Mimicry is a process of deception. It is an evolutionary pro¬ 
cess with the help of which organisms survive by deceiving 
its predator. But this deception happens only if the environ¬ 
ment contains similar appearing noxious organisms which 
the predators find unpalatable. Palatable organisms mimic 
the unpalatable ones through the process of evolution for 
survival of its species. The objective of this paper is to 
present an agent based artificial life model for simulating 
this natural process of the evolution of mimicry. 

According to Langton, Artificial Life is ‘Life made by 
Man rather than by Nature \ Taylor also defines it as a 
tool for biological inquiry (Taylor and Jefferson, 1993). 
While providing a brief survey over different AL models 
he talks about Wetware systems which work at the molecu¬ 
lar level, Software systems which work at the cellular level 
and Hardware systems which works at the organism level. 


The initial contribution of software systems in artificial life 
was from John von Neumann when he designed the first 
artificial-life model (without referring to it as such), the fa¬ 
mous self-reproducing, computation-universal cellular au¬ 
tomata (Von Neumann, 1966). He tried to understand the 
fundamental properties of living systems, especially self¬ 
reproduction and the evolution of complex adaptive struc¬ 
tures, by constructing simple formal systems that exhibit 
those properties. 

Being a special case of complex systems, Complex Adap¬ 
tive System (CAS) are diverse and are made up of multiple 
interconnected elements, and adaptive as they have the ca¬ 
pacity to change and learn from experience. Echo (Hraber 
et al., 1997) is a class of simulation model of CAS, provid¬ 
ing a population of evolving, reproducing agents distributed 
over a geography, with different inputs of renewable re¬ 
sources at various sites. Each agent has simple capabilities: 
offense, defense, trading and mate selection, defined by a set 
of chromosomes. Even though these capabilities are defined 
simply, they provide a rich set of variations illustrating the 
four kernel properties of CAS described by Holland (Hol¬ 
land, 1996). 

The Inspiration: Mimicry 

Henry W. Bates first published in 1862 his findings about 
the similarities and dissimilarities between Heliconiinae and 
Ithomiinae butterflies, after 10 years of research in the 
Brazilian rain forest. Bates collected ninety-four pieces 
of butterfly. He grouped them according to their similar 
appearance. He found butterflies having similar appear¬ 
ance, exhibiting morphological features which point to com¬ 
pletely different species even families. Out of the ninety four 
species, sixty seven are now classified as Ithomiinae, while 
twenty seven of them are Heliconiinae. 

Batesian Mimicry 

Even though Heliconiids are conspicuously colored, they are 
extremely abundant. They are also slow in mobility. Still 
predators in the surrounding area, mostly insectivorous birds 
do not prey on them, because of their inedible and unpalat- 
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able nature. Also because of this phenomenon other edible 
and palatable species such as ithomiinae and pieridae, pre¬ 
tend to be heliconiids and thus enjoy protection. 

Repulsive animals, such as heliconiids are very conspic¬ 
uously colored. Having this noticeable property, they are 
easily recalled by predators. Their wing pattern works as a 
warning to predators. Once a predator has the knowledge of 
their inedible and unpalatable property, they would probably 
never attempt to try it again. As this is true, if any organism 
within close family and species, but being edible and hav¬ 
ing a deceptive resemblance to those conspicuously colored 
species will be avoided by the predators. 

In general, the animal which is avoided by predator for 
unpalatable behavior is called the model and the imitating 
animal is called the mimic. 

Mullerian Mimicry 

Bates was not able to explain some phenomena of mimicry. 
Occasionally two inedible unrelated butterfly species are 
amazingly similar in appearance. An explanation for this 
was provided by Fritz Muller in 1878. When there are mul¬ 
tiple inedible species it is hard for predators to recognize 
each of them to know which one to consume and which 
one to avoid. Because of predator’s limited memory, all 
these species still lose their number even after being ined¬ 
ible. So to save this loss, and to prevent more sacrifice of 
their own kind, inedible species from different family also 
tend to evolve to have similar appearance. This phenom¬ 
ena is referred to as Mullerian Mimicry in the name of Fritz 
Muller. 

Evolutionary Dynamics of Mimicry 

The dynamics of mimicry has been investigated by Turner 
(Turner, 1988), where he states that the evolution of mimicry 
can be explained best by the process of punctuated equi¬ 
librium instead of phyletic gradualism. He came up with 
a synthetic theory (Turner, 1988), which was originated by 
Poulton (Poulton, 1912) and Nicholson (Nicholson, 1927), 
termed as the two stage model. This theory states that 
mimicry normally arises in two steps. A comparative large 
mutation achieves a good approximate resemblance to the 
model; it is followed by gradual evolutionary changes that 
refine the resemblance, in many cases to a high degree of 
perfection (Sheppard, 1972) (Ford, 1964). This two-stage 
theory has been applied for the explanation of Mullerian 
mimicry as well. 

Mimicry Ring Any theory of Mullerian mimicry has to 
take into account the phenomenon of the coexistence of 
multiple mimicry rings. If we examine the local butterfly 
fauna in any area of the world, we will find that between 
all the aposomatic (warningly colored and defended) species 
present there are normally only a limited number of different 
patterns, normally far smaller than the number of species. 


Each cluster of species, all sharing a common pattern, is 
termed as Mullerian mimicry ring. Thus, in the rain for¬ 
est of South and Central America, most of the long-winged 
butterflies (ithomiids, danaids, and heliconids) belong to one 
of only five different rings. 

Like Batesian mimicry, Mullerian mimicry can evolve in 
two stages: the mutational, one way convergence stage fol¬ 
lowed by the gradual, mutual convergence stage. It is worth 
mentioning that in the first stage only the less protected 
species can adopt the pattern of the better protected species; 
mutations in the other direction is not favored. 

The Model: Evolution of Mimicry 

Our model initializes with three kinds of agents. These 
agents have properties and behavior similar to the model, 
the mimic and the predator. We represent evolution of pat¬ 
tern for the model and the mimic with the help of Cellular 
Automata (CA) (Wolfram, 2002). CA can be easily repre¬ 
sented by simple rules, which can be expressed as a binary 
string. The predator will be equipped with a Hopfield net¬ 
work (Hopfield, 1982), to have pattern recognition capabil¬ 
ity. The process of evolution will be occurring at the genetic 
level. 

The choice of Hopfield Network memory for a preda¬ 
tor can be considered appropriate as the number of patterns 
which can be recognized by this network is inversely pro¬ 
portional to the accuracy of recall. As more patterns are 
memorized, Hopfield network tends to make more errors. 
This behavior will be appropriate for the simulation of Mul¬ 
lerian mimicry. Mullerian mimicry happens because of lim¬ 
ited memory of the predators. Because of this limited mem¬ 
ory, multiple inedible butterflies seems to converge to a sin¬ 
gle ring. 

Similar to the Laws and Life project by Peter Grogono 
(Grogono et al., 2003) the environment is designed as three 
dimensional, while the space will be of toroidal nature. 

Past Work 

Various models of mimicry has been simulated and ex¬ 
plored. The model by Turner (Turner and Speed, 1996) and 
the mathematical model of Huheey (Huheey, 1988) tend to 
focus on the selective pressure on prey brought about by the 
particular learning abilities of the predator, and employ sim¬ 
ple Monte Carlo or mathematical approaches. 

Sherratt (Sherratt, 2002) provides an innovative perspec¬ 
tive on the evolution of warning signals by considering co¬ 
evolving predator and prey populations. The model’s preda¬ 
tors are deterministic, in that they have a fixed behavioral 
strategy over their lifetime, and cannot learn from experi¬ 
ence. For both cryptic and conspicuous prey, each predator 
has fixed policy of either attacking or avoiding. 

Models by Franks and Noble The latest work on model¬ 
ing evolution of warning signals and mimicry with individ- 
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ual based simulation is done by Franks and Noble. Their 
initial work (Franks and Noble, 2002) seems to focus on 
putting some conditions of mimetic evolution in an indi¬ 
vidual based model with multiple species preyed upon by a 
single abstract predator, where the appearance of each prey 
species can evolve but their palatability is fixed. 

On 2003 (Franks and Noble, 2003) another model for the 
origin of mimicry ring has been proposed which is based on 
two working hypothesis: 

1. All of the Mullerian mimics in a given ecosystem should 
eventually converge into one large ring in order to gain 
maximum protection. 

2. If the Mullerian mimics do not converge into one large 
ring, then the presence of Batesian mimics could entice 
them to do so, by influencing the rings to converge. 

Although there are many mathematical and stochastic 
models of mimicry in the biological literature, this model 
gives attention to the evolution of mimicry ring phenomenon 
from an artificial life perspective. 

FormAL Framework 

The “FormAL framework” is a collection of concepts taken 
from Peter Grogono’s Formal Artificial Life (FormAL) 
project (Grogono et al., 2003) and are used to build a frame¬ 
work for this model. In FormAL, an Agent is a simulated 
organism. It is designed simply, but with capabilities of 
reproduction using genetic information and modification of 
genome between generations. There is also interaction be¬ 
tween agents while being able to survive and reproduce in a 
challenging environment. 

The framework consists of a three dimensional world 
where agents get complete freedom of movement defined 
from their genetic representation. This toroidal space is a 
3D lattice of discrete points, divided in multiple cells, which 
can be visualized. A cell is a three dimensional cubical sec¬ 
tion of the hyperspace. The purpose of the cells is to avoid 
expensive distance calculations. As two agents are consid¬ 
ered “close” to interact when they are in the same cell and 
“distant” otherwise. Time, being an integer (t > 0), ad¬ 
vances in discrete steps in the simulation, where at each step 
the agents update themselves. 

Mobility An agent’s position is calculated once during 
each step of update in time. The agents position, force, 
acceleration and velocity are all vector components. The 
force component is calculated from agent’s mobility gene, 
based on which some agents are faster/slower than others, 
and it is used to compute agent’s acceleration. If the force 
and velocity are both zero, then the agent has no effect 
in motion. Otherwise, Newton’s law is used to obtain the 
acceleration, which is integrated to obtain the new velocity 
and new position. 


The Prey: Models and Mimics 

For this simulation the preys are heliconius butterfly and the 
representation of their wing pattern is with the help of cel¬ 
lular automata (CA). Every prey organism contain a binary 
genetic representation of CA which generates a fully devel¬ 
oped pattern of size 16 by 16 bits from its initial state. With 
this pattern the predator will identify the prey and store its 
level of palatability in memory. We choose CA as it can 
be easily represented with the help of a binary genome and 
evolutionary operations on the genomic representation, such 
as mutation and crossover can easily be applied. This 8 bit 
genome has a decimal range between 0 to 255. Each of this 
value is associated to a unique CA pattern. In generating 
the pattern of figure 1, the genetic representation would be 
the ‘New state of center cell’. To store in Hopfield mem¬ 
ory we take a linear representation of this 2-D pattern and to 
find similarity between two patterns we calculate hamming 
distance between their linear representations. 



Figure 1: Cellular Automata Rule 30 


Current Pattern 

111 

110 

101 

100 

Oil 

010 

001 

000 

New state of center cell 

0 

0 

0 

1 

1 

1 

1 

0 


Table 1: Cellular Automata rule 


Species diversity Using CA based pattern representation, 
population of prey species with a specific pattern can be 
grouped as one single species. Also by restricting inter 
species reproduction we can control the diversity of pat¬ 
terns. But mutation is applied when similar species mate 
with each other, so new species born out of generations of 
existing species. That is why we have two separate mutation 
rate for reproduction of prey species. One being the “Pat¬ 
tern Mutation Rate” (default values are mentioned in table 
2) with which we control mutation of the first 8 bits of the 
genome while the “Genome Mutation Rate” is used to con¬ 
trol mutation of rest of the 9 bit genome. Similar efforts of 
multiple mutation rate at varying location has been used in 
developing Echo (Hraber et al., 1997). 

Genome The Genome of prey species consists of 17 bits. 
The first eight bits represent the rule, which is used to gen¬ 
erate CA pattern. Next two bits are used to represent palata¬ 
bility of the organism. Following six bits are the magnitude 
of force with which mobility of the organism is calculated. 
The 17th bit is used to evaluate reproduction capability of 
the organism. 
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Reflection of punctuated equilibrium Punctuated equi¬ 
librium is more inclined to cladogenesis instead of gradual¬ 
ism. Also Turner (Turner, 1988) emphasizes on punctuated 
equilibrium to describe the evolution of mimicry instead of 
phyletic gradualism. The design of the model under discus¬ 
sion also follows Turner’s explanation in terms of evolving 
mimicry. As it can be observed, new CA patterns evolve 
from existing ones in prey population just by a single mu¬ 
tation in the pattern gene. Mimics do not follow a gradual 
process of evolution to look close to models but rather the 
change happens randomly through a single step mutation. 
The mutations that are favored, helps the mimics to survive 
while the unfavored ones fail to persist. It can be observed 
later in table 5 how CA patterns of prey species can have 
vastly different configuration for a unit change in their rep¬ 
resentative gene, thus following the evolutionary process of 
punctuated equilibrium. 

Palatability gene The palatability of each prey species is 
fixed and has been represented with 2 bits (index 8 to 9) of 
the genome giving it a range of 0 to 3 with four levels of 
palatability. For the combinations of 00 and 01 palatability 
is true, while for 10 and 11 it is false. 

Interaction The prey have been defined to have many con¬ 
glomerate behavior in the environment. Prey interaction 
with other prey species and with predators make the evolu¬ 
tion of mimicry possible. Mobility of prey species and their 
reproduction capability are two important behaviors which 
result from interaction. 

Mobility The mobility genes of the prey consist of 6 bits. 
These six bits are used to calculate the force with which 
each prey try to move towards any neighborhood cell. The 
algorithm sorts all neighboring cell descending to the num¬ 
ber of prey species. Then it selects the cell which contains 
the highest number of prey with zero predator. If all the 
neighboring cells contain predators, then the algorithm sorts 
the neighboring cells descending on the number of preda¬ 
tors and chooses the one which contains the least. This im¬ 
plementation is to have a conglomerate behavior of all prey 
species, while running away from predators. 

Reproduction Every prey species starts reproducing 
when it reaches the “Reproductive age limit ”. If it is ca¬ 
pable of reproducing, which is decided based on its 17th bit 
gene, the prey will randomly select another prey species with 
similar pattern and palatability from the same cell and mate 
with it, given the other prey is also capable of reproduction. 
A prey is created from the existing genome of the two prey 
by applying single point crossover operation. Mutation is 
performed separately on the pattern gene and the rest of the 
genome, with two different rates to control them using the 


values in table 2. So there is two point mutation for the 
genome. 


Parameter 

Value 

Prey Size in the 3D FormAL environment 

2 to 5 

Reproduction age limit 

100 

Reproduction interval 

1000 

Pattern Mutation Rate 

0.05 

Genome Mutation Rate 

0.5 

Demise Age 

2000 


Table 2: Parameters to control prey population and visibility. 

Predator 

Predators in the system are designed to provide selection 
pressure to models and mimics for the evolution of mimicry. 
Similar to prey species, they are agents in the FormAL envi¬ 
ronment capable of mobility and reproduction. In addition, 
these agents are equipped with Hopfield Network Memory 
to be able to learn and recognize patterns of the prey species. 
Their mobility and reproduction capability are controlled at 
the genetic level, while their memory is not genetically con¬ 
trolled, as we could not find a suitable encoding for the ge¬ 
netic representation of Hopfield Network. Every new preda¬ 
tor is bom with zero memory and with no inheritance from 
parents. A set of parameters are defined to control predators’ 
population and learning ability in the environment (table 3). 

Learning The objective of a predator’s interaction with 
prey is always to consume it. But based on the prey’s pattern 
and palatability, the predator will either be able to consume it 
or throw it back to the environment. At this event the preda¬ 
tor needs to leam the pattern with which the prey has been 
represented. The pattern represents palatability of the prey 
species, at least to the predator. Every time a new interaction 
is made by the predator its memory is initialized with all the 
existing pattern that has already been encountered and the 
new one. The learning procedure used for this memory is 
Hebbian Learning (Hebb, 1949), which represents a purely 
feed-forward, unsupervised learning. Initially the weights 
of the Hopfield Network are all set to zero. Using Hebbian 
rule, the outer product of the input - output vector pairs are 
calculated for each pattern. The outer vector matrix of all 
the patterns are summed to come up with the final weight 
matrix. 

Input to memory Each prey contains an evolving cellu¬ 
lar automata which is represented by a binary genome. This 
two dimensional pattern is serialized to be available as a one 
dimensional binary array, which is taken as input for any 
predator organism trying to interact with the prey. This bi¬ 
nary representation of the pattern gets converted to a bipolar 
representation. Each input pattern consists of m x n = mn 
components, each component representing one pixel of the 
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pattern (m and n representing each dimension). The m by n 
pattern configuration is serialized by putting all row vectors 
in one single row sequentially. 

Predator attack algorithm As soon as a predator reaches 
its attack age it selects random prey species around its vicin¬ 
ity and starts attacking them. This attack process also in¬ 
volves recognition of prey pattern. Two parameters have 
been defined to limit predator memorization and recogni¬ 
tion process as both of these processes are computationally 
expensive. The “Hopfield Minimum Memory Size” is the 
number of memory a predator needs to store before mak¬ 
ing intelligent decisions about attacking a new prey species. 
When a predator is born, it starts attacking prey without any 
caution. But after every attack the predator will store its pat¬ 
tern and palatability level inside its memory. As soon as it 
reaches the minimum memory size, it will start making intel¬ 
ligent decision about attacking the next prey species. It will 
try to recognize the pattern and if found palatable, the prey 
will be consumed. Otherwise prey will be thrown back into 
the environment. If the pattern is not recognized predator 
will try to consume it and in the process will store its palata¬ 
bility and pattern into memory. In this way the predator 
memory is limited to “Hopfield Maximum Memory Size”. 
After reaching this memory predator will not store any more 
new pattern but will try to associate with the existing ones it 
has already stored. 

Genome Each predator has a 5 bit genome. The first 4 
bits are for mobility, while the last bit controls reproduction 
capability of each species. 

Mobility Movement behavior of a predator is calculated 
from its genome. The first 4 bit converted to decimal is the 
magnitude of force (varying from 0-15) with which it will 
move towards the maximum crowd of prey present within 
its neighborhood. Number of bits are less than prey (6 bits) 
to reduce their maximum speed. If no prey is present in 
the neighborhood then this force is active in trying to keep 
predators distributed all over the cells with a constant mobile 
behavior. This behavior of predator has been designed to 
increase predator prey interaction in the simulation in terms 
of one agent chasing the other for survival of species. 

Reproduction The fifth gene of the predator is used to 
represent their capability of reproduction. Depending on its 
binary value a predator in the simulation will or will not be 
able to reproduce. The reproduction process for predators is 
similar to prey species as the learning capability of preda¬ 
tors do not have any genetic representation. Similar to the 
prey the predator also have the “Reproduction Interval” and 
the “Reproduction Age Limit” which is the minimum age it 
has to reach before starting to mate. Using these parameters 


we can control the population of predators and by which we 
also control the rate of predation on prey species. 


Parameter 

Value 

Minimum Memory Size 

2 to 6 

Maximum Memory Size 

10 

Hopfield Maximum Iterations 

20 

Attack Age 

500 

Attack Interval 

100 

Genome Mutation rate 

0.3 

Reproduction Age Limit 

500 

Reproduction Interval 

1000 to 3000 

Demise Age 

2000 to 7000 


Table 3: Parameters to control predator population and pat¬ 
tern recognition capability. 

This model has been designed to come up with efficient 
results and achieve the main objective, evolution of mimicry. 
Creation and transformation of different mimicry ring and 
also the dynamics of it has been integrated to achieve in¬ 
teresting results. This model can also be considered as 
a complex adaptive system similar to Holland’s work on 
Echo (Holland, 1996). The seven basics of a complex adap¬ 
tive system which are: Aggregation, Tagging, Nonlinearity, 
Flow, Diversity, Internal Models and Building blocks (Hol¬ 
land, 1996) are present in this model. Individual components 
of this model such as the different types of agents and their 
properties can be considered as building blocks. Each prey 
species are tagged with individual pattern and palatability 
with which predators recognize them. We are providing dif¬ 
ferent properties, behaviors and goals to the agents but set¬ 
ting them free in the environment to observe their aggregate 
behavior, resulting in non-linear or unpredictable outcome. 
The model has its flow as it progresses in time. Also there is 
diversity of prey species in the environment. 

The Results 

Data and analysis in this simulation has been concentrated 
on evaluating whether evolution of mimicry has taken place. 
This evaluation can be made with the number of different 
rings that has been created and the size of each of those 
rings along with the population of palatable and unpalat¬ 
able species. Also it can be established whether Batesian 
Mimicry and Mullerian Mimicry have taken effect by ana¬ 
lyzing the data set of these populations. 

Mimicry Ring Reports 

The mimicry ring reports consist entirely of the population 
of prey species categorized according to pattern and palata¬ 
bility. Data is stored at time interval of 10 iterations. As the 
number of rings that get generated reaches as many as 50 or 
more, and all the population of ring do not last for the en¬ 
tire simulation, so while storing data we have taken the most 
populous of the surviving 8 rings to plot. Mimicry Ring 
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Figure 2: Population distribution of mimicry rings, initialized with 2 prey species, 10k iterations 


Hamming distance between patterns is 10 % of the pattern 
size. 

Initial configuration with two prey species 



Prey configuration 

Predator 

configuration 

Population 

RulellO 

(Palatable) 

% 

108 

10 

Rule30 

(Unpalatable) 

W 

108 

Reproduction 

Age Limit 

100 

500 

Interval 

1000 

1200 

Mutation 

Rate 

Pattern 

0.05 

0.3 

Genome 

0.5 

Demise Age 

2000 

2500 

Minimum 
Attack Age 


500 

Memory 

Configuration 


Minimum 

2 

Maximum 

10 


Table 4: Agent configuration of 2 prey species 


The set of parameters in table 4 were carefully selected 
to be the initial condition for this run of the simulation. 
This test has been done with two sets of prey species with 
very different Cellular Automata pattern and with opposite 
palatability and equal population. To control reproduction of 
the prey species their age limit has been set to 100 iterations 
into the time the species were alive. And the reproduction 
interval was set to 1000 iterations. 

Pattern mutation rate has been set to a minimal level of 
0.05 as by increasing this variable it is possible to increase 
the size of the number of mimicry rings present in the simu¬ 
lation. The genome mutation rate controls the rate at which 
genome of the child prey species will deviate from their par¬ 
ents. 

Prey demise age has been kept to 2000 iterations while 
predator demise age is set to 2500. Predators in this simula¬ 


tion generate selection pressure for the evolution of mimicry. 
So the longer a predator is present in the simulation it will 
be making intelligent decisions. Using this rate of demise 
for predator we were able to create successful mimetic pop¬ 
ulation of prey species. 

Initial population of predator species has been set to 10 
which is in accordance with the prey population in the sim¬ 
ulation. The reason for such low number of predator is, un¬ 
like prey species which are consumed by predators, there is 
no cause for the predator species to die except their natural 
cause of death, that is to reach their demise age. So predator 
population can explode very easily. That is why their pop¬ 
ulation is controlled in a restrictive manner with the help of 
high reproduction age limit and reproduction age interval. 

The plot in Figure 2 is simulation time verses prey popu¬ 
lation after running it for 10000 iterations. With the initial 
configuration in the above table we can observe that mul¬ 
tiple rings of prey population have been created. Two prey 
species are considered to be in a ring if their C A pattern have 
Hamming distance within 10 bits. Population of palatable 
species have been represented with line curve while popula¬ 
tion of unpalatable species have been presented with dotted 
curve. Different signs of squares, triangles and diamonds 
have been used to distinguish between species of prey pop¬ 
ulation. The simulation was initiated with two prey species 
having CA rule of 110 and 30, being palatable and unpalat¬ 
able respectively. Rule 110 and 30 has been used as their 
phenotype is distinctly different from each other and Hop- 
field Network will easily distinguish them. Over time the 
population of CA Rule 30 dominates the population (Figure 
2) as most predators recognize it as unpalatable. Similarly a 
palatable population of CA Rule 30 or within the same ring 
of palatable species starts rising, while at one point overlaps 
the population of CA Rule 110 (Time: 4000 approx.). CA 
Rule 110 was initialized as a set of palatable species. 

We can observe from the above result that the evolution of 
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Figure 3: Population distribution of mimicry rings, initialized with 4 prey species all unpalatable. 


mimicry has taken effect. A population of mimics were suc¬ 
cessfully able to exceed the population of other prey species, 
and the reason being, avoidance by predators of prey pattern 
similar to unpalatable ones. We can conclude that Batesian 
mimicry has taken effect in the simulation. 

The number of rings in this simulation makes a slow in¬ 
crease from 2 at the initial configuration to 27 rings at the 
end of 10000 iterations. A small change in CA genetic rep¬ 
resentation can have a very large effect in terms of the phe¬ 
notype of the pattern with which the prey is represented. For 
example if we take a look at the set of almost similar pattern 
genotype with vastly different phenotype in table 5. 


CA Rule 

60 = 00111100 

61 = 

00111101 

62 = 00111110 

Pattern 

1 

P' 


s 


4 


Table 5: Difference in prey pattern genotype and phenotype 

All the patterns in table 5 have a genetic bit difference of 
1. So by a single mutation there can be three different set 
of phenotype for a child organism from its parent. This is 
largely the reason for the increased number of mimicry rings 
created in the simulation. Only the 8 most populous rings are 
presented in the graphs with population verses simulation 
time. 

To evaluate the simulation at a more complex level we in¬ 
creased the prey population to 900, consisting of 6 different 
species with very different pattern configuration. To boost 
predator-prey interaction we also increased the number of 
predator population to 30. This resulted in an enormous di¬ 
versity of species where the total number of mimicry rings 
reached nearly 50. Details of this result can be found in (Is¬ 
lam, 2011). 


Initial configuration with only unpalatable species 

To further observe the effects of mimicry ring we initial¬ 
ize the simulation with all four unpalatable prey species. 
As explained earlier the minimum memory configuration is 
also set to four in accordance to the initial number of prey 
species. Rest of the parameters remain quite unchanged. 

The results according to figure 3 are much expected. The 
population of unpalatable species have prevailed. After 
nearly 8000 iterations we can see unpalatable species of CA 
rule 55, 110, 30 and 190 have prevailed. All of their palat¬ 
able counter parts are also increasing their population de¬ 
ceiving the predators. 

This experiment is an ideal scenario for observing Mulle¬ 
rian mimicry. Mullerian mimicry occurs between multiple 
species of unpalatable prey population. From Franks and 
Noble (Franks and Noble, 2003), we note that multiple Mul¬ 
lerian mimicry rings are expected to converge into one large 
ring through the evolutionary process of punctuated equilib¬ 
rium. But in this experiment as the predator’s ‘Minimum 
Memory Configuration’ is set to four, all predators have the 
capability to recognize four prey patterns before starting to 
make intelligent decision of consuming them. By setting 
‘Minimum Memory Configuration’ to one, also increasing 
‘Predator Demise Age’ to 7000 and decreasing predator’s 
‘Reproduction Age Interval’ to 1500, we run the simulation 
for 6000 iterations, and there was no sign for all prey popu¬ 
lation to converge into one large ring. All four unpalatable 
prey population have a very dominant presence in the sim¬ 
ulation. Even after reducing predator minimum memory to 
one pattern, different population of predators become famil¬ 
iar with different prey patterns, which resulted in the exis¬ 
tence of multiple Mullerian mimicry ring instead of a single 
one. 

In contrast when the simulation was initiated with only 
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palatable species all population of prey were consumed by 
predators at nearly 7000 iterations, details of which can be 
found in (Islam, 2011). 

Analysis 

For all possible initial conditions, Batesian mimicry has 
taken effect. It can be observed that for every ring of un¬ 
palatable species there is an existence of the palatable ring 
racing to reach the population count of its unpalatable coun¬ 
terpart. Effects of Mullerian mimicry can also be observed 
best for the experiment initialized with only unpalatable prey 
species. We initialized the model with 4 rings of unpalatable 
species with no palatable ones and after nearly 10K itera¬ 
tions, all of the initial unpalatable rings have survived with 
dominance. The cause of this behavior can be explained 
by the minimum number of patterns that each predator can 
store in memory, which was set to four. So this parame¬ 
ter was reduced to one to observe whether it is possible to 
converge all different unpalatable rings into one large ring, 
when predators are capable of memorizing only a single pat¬ 
tern. But as it turned out, the phenomena of “a single large 
ring” does not occur because different predators recognize 
different patterns resulting in multiple divergent Mullerian 
mimicry rings. It can be concluded that our results are con¬ 
sistent with those of Franks and Noble (Franks and Noble, 
2003), that multiple Mullerian mimics do not converge into 
one large ring. These claims can only be made within the 
limits of this simulation. 

Conclusion 

Analysis of the results tell us that we have successfully been 
able to simulate the evolution of mimicry. In addition to 
that, this model provides a more accurate simulation of the 
fascinating natural process of mimicry rings and their shift 
in population. This model also verifies the theory of Turner 
in explaining the evolution of mimicry with punctuated equi¬ 
librium (Turner, 1988). 
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Abstract 

A population-based simulation framework is presented that 
allows a principled approach for exploring gender inequalities 
in professional hierarchies such as universities or businesses, 
and how they might emerge, evolve and be rectified. Results 
from a representative range of cases involving gender-based 
discrimination and intrinsic gender-based ability differences are 
presented to demonstrate the power of the approach. Such 
artificial life simulations will hopefully inspire and facilitate 
better approaches for dealing with these issues in real life. 

Introduction 

There has been much discussion in recent years about gender 
imbalance in certain professions, such as university computer 
science departments (e.g., Camp, 1997; Altonji and Blank, 
1999; Handelsman et al., 2005; Moss-Racusin et al., 2012), 
and how one might go about rectifying such situations, for 
example by better advertising or positive discrimination. 
However, it is often difficult to identify the best solutions 
when it is not clear what the main causes of the imbalance are 
(Halpern et ah, 2007), and applying solutions based on 
incorrect assumptions could easily make matters worse. 

One obvious potential cause of imbalance is simple 
discrimination against a particular gender (e.g., Davison and 
Burke, 2000; Moss-Racusin et ah, 2012), and if that cannot be 
prevented, some form of positive discrimination might be an 
appropriate remedy. Another possible cause is that one 
gender might have evolved to be intrinsically less able (either 
on average or in the tails of the distribution) in a particular 
area (e.g., Geary, 1998; Browne, 2002; Baron-Cohen, 2004; 
Halpern et ah, 2007; Halpern, 2012), and that results in less 
success in that area, and hence a tendency for that gender to 
avoid entering related professions in future. It is not obvious 
what interventions here would be most beneficial, or whether 
any intervention at all would be a good strategy. Another 
possibility is that, despite having intrinsically equal ability in 
the chosen area, one gender is disadvantaged by other factors, 
such as delays in career progression caused by child rearing 
and maternity leave (e.g., Ceci and Williams, 2011), and these 
cases may require different forms of intervention. 

The idea of using computer simulations to model such 
situations and explore the best strategies for intervention in 
complex processes such as these is not new (e.g., Martell, 
Lane and Emrich, 1996; Robison-Cox, Martell and Emrich, 
2007; Helbing, 2010), but what might not be so widely 


appreciated is that population-based simulations with ability- 
based selection of the type commonly used in computational 
intelligence (e.g., Engelbrecht, 2007) and artificial life (e.g., 
Bullinaria, 2009, 2010) can be effective for exploring the key 
causes, effects and solutions here. They can also model the 
evolution of such factors by natural selection. Moreover, the 
known methodological pitfalls that commonly arise with 
agent-based approaches to social and economic simulation 
(Richiardi, Leombruni, Saam and Sonnessa, 2006) are well 
understood in the field of artificial life and can thereby more 
easily be avoided. This paper presents a general framework 
for performing such simulations, and provides a selection of 
results that illustrate the power of this approach. 

The remainder of this paper is organized as follows: The 
next section describes the proposed simulation framework and 
its associated simplifications and assumptions. Then results 
from some preparatory baseline simulations are presented to 
establish appropriate values for the various free parameters. 
The next two sections show how those results differ in the 
cases of gender-based ability differences and discrimination. 
Finally, the effect of interventions, and how the evolution of 
individual preferences affect the results, are explored. The 
paper ends with some conclusions and discussion. 

Simulation Framework 

This study begins by setting a few basic principles, and then 
explores what is possible within that general framework. The 
idea is to have an evolving population of individuals, with a 
range of intrinsic (innate) abilities, who can progress during 
their lifetimes to improve their position within their chosen 
professions. To draw reliable conclusions, the simulations 
need to be kept as clear and unbiased as possible (Bullinaria, 
2009, 2010). Therefore, for the purposes of this initial study, 
a number of simplifying assumptions are made that help avoid 
any unnecessary confounding factors and also reduce the 
computational costs of the simulations to feasible levels: 

1. There are two distinct genders, which are chosen 
randomly at birth with equal probability, and overall are 
equally able. 

2. The distributions of innate individual abilities are the 
result of the evolutionary past, but are fixed for the 
duration of each simulation. 

3. There are two distinct professions, which overall are 
equally valuable. 
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4. The initial individual abilities for each profession are 
determined randomly at birth and follow a normal 
(Gaussian) distribution. The means and/or the standard 
deviations of those distributions may depend on gender. 

5. If one gender has higher mean ability in one profession, 
the other gender will have an equally higher mean ability 
in the other. The effect of the magnitude of such 
differences is one of the key factors to be explored. 

6 . Each individual can choose their profession randomly, or 
according to their abilities, or could have an intrinsic 
gender-based preference (i.e., probability) for choosing 
one profession over the other. Such preferences might 
emerge during the course of the simulations. 

7. Individuals grow older, potentially improve their abilities 
through experience in their profession, and eventually 
retire and leave the working population. 

8 . Professional development involves a series of stages, and 
promotion between them is (by default) determined purely 
according to the best abilities currently available at each 
level (Rosenbaum, 1979). Discrimination or intervention 
in that process are other key factors to be explored. 

9. If an individual does not get promoted within a set number 
of simulated years, they are likely to give up and leave the 
working population. It is also possible that varying 
percentages of individuals leave the working population 
for other reasons. Such details will need to be explored. 

10. Individuals leaving the population are replaced by new 
individuals, and profession preferences may be based on 
the more successful individuals of previous generations. 

There clearly remains much scope for variations within this 
general framework, and what emerges will depend on the 
relative magnitudes of the various parameters involved. 
There is also scope for variations designed to investigate the 
consequences of these initial simplifications. 

The simulations follow common Artificial Life procedures. 
For each new individual in the population, a record is created 
and initialized with their innate gender, intrinsic abilities for 
the two professions, and any preferences for the professions. 
Thereafter it will be regularly updated with their age, chosen 
profession, stage in their profession, and number of years 
since reaching that stage. After updating for a number of 
simulated years, the population averages will settle down into 
a steady state, and the relevant results can be computed. 

In principle, the above general framework can be used to 
simulate “professions” in any species. For example, food 
provision versus offspring protection in wild dogs. However, 
this paper will concentrate on abstract human professions, and 
therefore adopt human-like lifetimes and other parameters. 
It will assume, for simplicity, that all individuals enter their 
chosen profession at age 20 and retire at age 70, and that each 
profession has 7 stages, so 6 promotions are required to reach 
the top stage. Again for simplicity, it will be assumed that 
there is just one employer for each profession, so there is no 
need to simulate transfers between employers, or sub-groups 
of eligible individuals being considered at each promotion 
stage, as that is already known to bias the results (Lyness and 
Judiesch, 1999). A population size of 10,000 provides a 
sufficient number of individuals per profession per stage per 
gender for a reasonable level of competition at each stage, 


even when the distributions become skewed. The ability 
scale is measured in arbitrary units, and that will be set 
(without loss of generality) by taking the standard deviation of 
the initial Gaussian distributions to be 1.0, and measuring all 
other ability differences relative to that. 

A workable grain size for the simulations is one round of 
updates per simulated year, and 10,000 simulated years is 
plenty for all populations to settle down into a stable final 
state. Updating the individual ages, applying any ability 
increments, and replacing retired and removed individuals is 
straightforward. Dealing with the promotions between the 
stages of each profession requires further specification. One 
approach is to maintain pre-chosen numbers at each stage to 
correspond to typical companies (e.g., Robison-Cox, Martell 
and Emrich, 2007). An alternative approach, adopted here, 
is to promote a fixed fraction x of eligible individuals at each 
stage in each profession each year. Varying the promotion 
criteria and x, and requiring a certain number of years at a 
given stage before becoming eligible, are factors that will 
need to be explored empirically. Finally, an important aspect 
of this study is to incorporate into the standard setup a whole 
range of parameterized ability differences, discriminations and 
interventions that might be considered relevant. 

The output of each simulation will usually be the final 
population of individuals, each with a gender, age, preference 
for profession, profession, ability in their chosen profession, 
and profession stage. Typically, the main factors of interest 
will be the various differences in population means between 
genders, such as how the profession preferences and numbers 
at each profession stage depend on gender (e.g., Robison-Cox, 
Martell and Emrich, 2007). Sometimes the evolution of the 
key parameter values throughout the simulation will also be of 
interest. To obtain reliable results, means and standard 
deviations over thirty runs of each simulation are computed, 
and unpaired t tests are used to determine the statistical 
significances of any differences found. 

Baseline Simulation Results 

The approach adopted is to first present the results from the 
simplest possible simulation set-up, and then systematically 
investigate how the potential variations affect those baseline 
results. Such a sequential approach also facilitates the 
setting of the various parameter values at each stage. 

The baseline case simply has the most able individuals 
promoted at each stage, with an equal promotion fraction x at 
each stage varied from 0.01 to 0.05. The resulting numbers 
at each profession stage are shown in Figure 1. For very 
small x values (0.01), no individuals reach the upper stages. 
For high values (0.04 and above), more are in the highest 
stage than in some of the lower stages. Values around 0.02 
to 0.03 probably come closest to realistic situations in 
academia or industry where there is a pyramid structure with 
fewer individuals as one moves up the hierarchy (Rosenbaum, 
1979). Having different promotion fractions x 5 for each 
stage s might be required to model realistic scenarios, and that 
can easily be done, but, for simplicity, the following will 
continue with a single value x across all stages. 

The first variation requires individuals to wait at each stage 
for a certain number of years w before they become eligible 
for promotion. Now, a higher proportion of eligible 
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Figure 1: The initial baseline results showing the number of 
individuals at each profession stage and how that varies across 
a range of different promotion fractions x. 
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Figure 2: Number of individuals at each stage for promotion 
fractions x = 0.06 for different numbers of years w required at 
each stage before becoming eligible for promotion. 



Figure 3: Number of individuals at each stage for promotion 
fractions x = 0.06 and wait w = 4 for different numbers of 
years g without a promotion before giving up and leaving. 



Figure 4: Number of individuals at each stage for x = 0.06, 
w = 4 and g = 12 for each gender (GenO, Genl) with no ability 
difference (Dif 0) and one std. dev. difference (Dif 1). 


individuals need to be promoted each year to fill the higher 
stages. For a promotion fraction of 0.06, the effect of 
varying the required number of years w from 0 to 8 is shown 
in Figure 2. For 0 years, the promotion fraction is too high, 
as seen in Figure 1. Waits w of around 6 years lead to a 
reasonable distribution of individuals across the stages. 

The next variation explores the effect of giving up and 
leaving the profession if promotion is not achieved within a 
certain number of years g after becoming eligible. Now, a 
slightly shorter wait w is needed so sufficient numbers are 
eligible for promotion at each stage. For a promotion 
fraction of 0.06 and a 4 years wait for eligibility, the effect of 
varying the number of years g before giving up from 4 to 32 is 
shown in Figure 3. For g = 32 years, there is little difference 
from never giving up. For fewer years, the numbers at later 
stages fall more sharply, and since the total number of 
individuals is fixed, there are more at the initial stage. 

For situations that have learning or experience increase the 
individuals’ abilities in line with the number of years in their 
chosen profession, or at each stage in that profession, it will 


be interesting to investigate the different age distributions that 
emerge for each gender at each stage, and to go on to explore 
the effect of factors such as maternity leave. 

Exploring Gender-based Ability Differences 

Having seen how the three promotion parameters (x, w, g) 
affect the distribution across stages, explorations of the effect 
of gender differences can begin. The baseline simulations 
suggest that a promotion fraction of 0.06, a 4 years wait for 
eligibility, and giving up after 12 years, provides a reasonably 
realistic basis for the forthcoming simulations. Varying those 
values by small factors will inevitably change the results, but 
is unlikely to affect the general emergent patterns. 

To begin, separate distributions for the two genders (GenO 
and Genl) can be plotted for the case when each individual 
randomly chooses one of the two professions. Figure 4 shows 
the results for one particular profession when the gender 
difference is zero as above (Dif 0), and when the mean 
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Figure 5: Number of individuals at each stage for same set-up 
as Figure 4, but with each individual pursuing the profession 
they are best at, rather than choosing one at random. 
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Figure 6: The Dif 1 results of Figures 4 and 5 as percentages 
of the whole population of each gender, for random choice of 
profession (rand) and choice of best profession (best). 
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Figure 7: The percentage of Gent at each stage, with and 
without ability differences (Dif 1, Dif 0), for random choice of 
profession (rand) and choice of best profession (best). 


Figure 8: The percentage of Genl at each stage, for equal 
(Var 1) and half (Var 0.5) ability variance, for random choice 
of profession (rand) and choice of best profession (best). 


abilities differ by one standard deviation (Dif 1). Obviously, 
there is no significant gender difference in the zero difference 
case. For a unit difference, the more able gender (GenO) for 
the given profession has more individuals at the higher stages, 
and fewer stuck at the initial stage. The effect of dropouts 
also affects the total number of each gender participating in 
the profession. For the Dif 1 case there are 2746 (std. dev. 
39) of GenO and only 2276 (std. dev. 40) of Genl, which is a 
significant difference (t test , p < 0.01). For Dif 0, there are 
2511 (std. dev. 54) of GenO and 2500 (std. dev. 46) of Genl, 
with no significant difference (t test , p > 0.05). 

If, rather than choosing their profession randomly, each 
individual were to choose the profession for which they have 
the best ability, the outcome is rather different as shown in 
Figure 5. Again there is no significant gender difference for 
the Dif 0 case, but for Dif 1 there is a massive statistically 
significant (t test , p < 0.01) reduction in the number of Genl 
individuals choosing the profession, 1114 (std. dev. 44) 
compared to 3893 (std. dev. 52) for GenO. Figure 6 shows 
the Dif 1 results for random choice of profession (rand) and 


choice of best profession (best), as percentages of the whole 
population of each gender at each stage in the profession. 
The ability-based choice of profession brings the gender 
distributions a little closer together, but Genl still has much 
reduced numbers at the higher stages compared to GenO. 

Another way of representing the data is as the percentages 
of each gender at each stage. Since these always total 100%, 
it is sufficient to present the results only for Genl. These are 
shown in Figure 7 for best and random profession choice. 
For no gender difference (Dif 0), the percentage of Genl at 
each stage is not significantly different to 50%. For Dif 1 
and random profession choice, the proportion at stage 1 is 
slightly over 50% (due to weaker individuals waiting for a 
promotion that never comes) and then falls for later stages. 
When the best profession is chosen, there is a lower starting 
point, and a slower fall off at later stages. A gender-based 
ability difference leads to stage distribution percentage 
differentials even when self selection leads to reduced 
participation of the less able gender. This kind of pattern, 
known as a shrinking pipeline, is found in real populations, 
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(Camp, 1997), though not necessarily for the same reason. 

Interestingly, similar shrinking pipelines can arise even 
when there is no difference between genders in their mean 
abilities. If the variance in abilities for Genl is less than that 
of GenO, as apparently happens with some human skills (e.g., 
Humphreys, 1988), that can give Genl a disadvantage at later 
stages of promotion, even if the means are the same for each 
gender. Figure 8 shows the effect of a factor of two in 
ability variances for random and best choices of profession. 
Similar patterns also arise when there are combinations of 
mean and variance differences. It is clear that there are 
many possible types of gender differences in ability that can 
account for the unequal gender distributions observed in real 
professions. From the simulation point of view, one can add 
further realism by replacing the simple Gaussian distributions 
used here with something more appropriate, but determining 
what those distributions should be might not be so easy (e.g., 
Benbow, 1988). Of course, the observed differences may 
also occur when there are no ability differences at all, and that 
is what will be investigated next. 


Exploring Gender-based Discrimination 

Perhaps the question of most practical importance is: how do 
the above patterns of gender differences vary when, rather 
than any intrinsic ability difference, there is discrimination 
against a particular gender? Given the range of possibilities, 
it is not feasible to study all types of discrimination here, nor 
the potential reasons for them. However, to demonstrate the 
power of the simulation framework, it is sufficient consider a 
simple abstract case. In particular, suppose an individual of 
one gender had to be vastly superior to a rival of the other 
gender before being promoted before them. That might, for 
example, arise due to different perceived prior probabilities of 
the abilities for the two genders (that are not necessarily 
correct) being used in conjunction with the actual evidence 
submitted with the promotion application. It could also be 
indirect, rather than direct, discrimination, for example due to 
one gender being less likely to be awarded prestigious invited 
talks or prizes (e.g., Gtirer and Camp, 2001), or due to the 
promotion criteria being skewed in favour of one gender (e.g., 
Schneider, 1998; Ginther and Hayes, 2003; Mixon and 
Trevino, 2005). To be specific, the above simulations were 
re-run with a Genl individual only being promoted in 
preference to a GenO individual in the profession of interest 
when their ability is at least one unit higher. The symmetry 
was maintained by having discrimination in the opposite 
direction for the other profession. That leads to exactly the 
same stage distribution as in Figure 4 for the case when the 
GenO ability distribution really was one unit lower. 

What is different between the discrimination and ability 
difference cases is the average abilities at each stage A 
similar pattern emerges for both random profession choice 
and ability-based choice, though choosing according to ability 
not surprisingly leads to better ability levels throughout. In 
the ability difference case (Dif 1), the ability of the weaker 
Genl is lower than GenO at the entry stage 1, but the ability- 
based promotions lead to much closer ability levels at later 
stages. In the discrimination case (Disc 1), the GenO abilities 
are reduced to the same degree as the Dif 1 case, because 
there is less competition for promotion, but the Genl abilities 



Figure 9: Percentage of Genl at each stage, with no ability 
or discrimination differences, starting at half or quarter total, 
for profession chosen randomly (rand) and by ability (best). 

are much higher, due to the extra ability required to achieve 
promotion. Similar, though smaller (half a std. dev.) gender 
differences in ability have been observed in real corporations, 
suggesting that the presence of gender discrimination there 
(Lyness and Heilman, 2006). That, of course, does not mean 
there is necessarily a gender discrimination based glass ceiling 
in all cases, but there is certainly evidence consistent with that 
existing elsewhere too (e.g., Sabatier, 2010). 

It is often suggested that indirect forms of discrimination 
are discouraging young women from entering particular 
professions, such as computer science, in the first place (e.g., 
Gtirer and Camp, 2001). This effect can be modelled too, 
with the above simulations run in the same way whatever 
factor is reducing the numbers entering the given profession. 
The simplest possible case has no other promotion-based 
discrimination and no intrinsic ability differences, and the 
results in Figure 9 show for two rather different starting 
fractions that, as long as the profession is chosen randomly, 
those fractions persists throughout the stages. However, if 
the individuals that choose the profession are doing so 
according to their best abilities, they are likely to be at the 
higher end of the ability distribution, and fair promotions will 
allow them to rise quickly through the stages so that the 
gender proportions become equalized at the highest stages. 

Figures 7, 8 and 9 demonstrate how rather different pipeline 
patterns emerge depending on the situation simulated. These 
are the “pure” cases. In practice, there is likely to be more 
than one form of ability or discrimination difference present, 
and untangling the various factors will be a challenge. This 
is where the simulation approach proposed here will prove 
most useful, as it enables all the possible combinations and 
variations to be simulated relatively easily and reliably, with 
the inevitable interactions accommodated automatically. 


Exploring Intervention Policies 

The shrinking pipelines and gender differences in the numbers 
entering some professions are often argued to be important 
issues that need addressing. For example, the lack of women 
in certain higher stages of academia might discourage women 
from studying those subjects and that may lead to critical 
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Figure 10: The average overall abilities at each stage for the 
standard and intervention populations for the three situations. 
The Base, Base Int and Disc Int results are identical. 



Simulated Year 


Figure 11: Average positions (profession stages) achieved 
by one particular gender while strong profession preferences 
emerge. The speed of change is parameter dependent. 


skilled-worker shortages some areas (e.g., Camp, 1997). 

The classic intervention would be to simply make sure that 
the numbers of each gender at each stage of each profession 
are as equal as possible. That can be implemented easily by 
ranking the eligible individuals of each gender separately, and 
promoting equal numbers of each gender to the next stage to 
give the required total number of promotions overall. The 
consequence of doing that in the above simulation framework 
does then lead to no significant number differences between 
the baseline, gender-based ability difference, and gender- 
based discrimination cases. 

This intervention results in the expected differences in the 
corresponding abilities. All groups have the same average 
ability at each stage, except the less able Dif 1 Genl case 
which is one unit below at all stages, because equal numbers 
of promotions are taking place despite the lower ability levels 
of that gender. That leads to the important practical question: 
what is the average ability of the individuals at each stage, 
irrespective of their gender. That is shown in Figure 10. The 
baseline (Base) and baseline with intervention (Base Int) 
results are identical, since there is no gender imbalance for the 
intervention to correct, and these exhibit the best average 
abilities overall. The discrimination (Disc) case is slightly 
worse, since it unfairly allows weaker GenO individuals into 
the upper stages, rather than more able Genl individuals. 
The discrimination with intervention (Disc Int) case is no 
different to the base case, since the intervention successfully 
corrects the discrimination-based imbalance, and once again 
allows the best individuals at each stage to be promoted. 
The innate ability difference (Dif) case is overall worse than 
the base and discrimination cases, because that corresponds to 
Genl individuals having lower abilities than the base case, 
and that inevitably brings the population averages down. 
Obviously, if the gender difference corresponded to improved 
abilities for GenO over the baseline, rather than reduced 
abilities for Genl, that would lead to improved population 
averages over the base case. The important question is: what 
will the consequences of intervention be in this case? As 
Figure 10 clearly shows, this makes the overall population 
performance (Dif Int) worse, particularly at the higher stages, 
since it forces the promotion of weaker Genl individuals over 


better GenO individuals. This highlights the importance of 
understanding the problem before trying to correct it. 

Evolving Preferences for Professions 

So far, the simulations have been run for many generations to 
allow enough time for the various population distributions to 
stabilize, but none of the innate properties or preferences have 
been allowed to evolve or change from one generation to the 
next. However, the steady-state evolutionary computation 
approach underlying the general framework proposed here can 
automatically allow any inherent parameters (such as gender- 
based abilities, or preferences for particular professions) to 
evolve by natural selection if required (Bullinaria, 2009). In 
some species, such factors may be encoded genetically, but 
for current human professions it is more likely that such 
information will be passed on mimetically, in the form of 
social learning or mimicry (Bullinaria, 2010). For example, 
low take-ups of particular professions could emerge as a 
sensible reaction to poor progress in that profession by 
members of their gender in previous generations. There are 
clearly many such factors that could usefully be explored in 
the proposed simulation framework, and many ways they 
could be implemented, but one simple example should suffice 
to demonstrate the power of the simulation approach. 

Suppose individuals choose their professions stochastically, 
rather than according to their abilities, but with particular 
intrinsic probabilities. The initial population would have 
equal preferences for the two professions, but individuals in 
later generations will have preferences that vary according to 
the success of recently replaced individuals. There are many 
ways that can be implemented, but one approach is enough to 
illustrate what typically emerges: each new individual copies 
the preference probabilities of the more successful of the last 
two replaced individuals (i.e. the one with the highest final 
position) but with a random “mutation” added from the range 
[-0.02, 0.02]. Those small mutations are sufficient to allow 
the preferences to drift away from the symmetric 0.5 values if 
a final position advantage emerges from doing so. 

There are eighteen distinct combinations to consider: three 
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Figure 12: Ability levels for each gender (GenO, Genl) in the 
baseline (Equal), ability difference (Dif 1) and discrimination 
(Disc 1) cases, for preference-based profession choice. 
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Figure 13: Ability levels for each gender in their appropriate 
profession for the ability difference (Dif 1) or discrimination 
(Disc 1) cases, for the three profession choice approaches. 


basic conditions (baseline, gender-based ability difference, 
and gender-based discrimination), three profession choice 
approaches (by best ability, purely random, and random with 
intrinsic preferences), and each case can either involve, or not 
involve, intervention to equalize the numbers of each gender. 
Clearly, all the baseline cases, all the ability-based and pure 
random profession choice cases, and all the intervention cases, 
lead to the average preferences remaining at 0.5 because there 
is nothing to drive natural selection away from that symmetric 
case. However, those cases were still run to provide a check 
that no unexpected biases exist in the simulations. If there is 
no intervention, both the ability difference and discrimination 
case preferences shift towards the profession where the 
greatest success is most likely, while the preferences for the 
baseline case remain near 0.5 as expected. This is probably 
the simplest explanation of many of the observed gender 
differences in the numbers entering certain professions. 

Since the professions and genders are set up symmetrically, 
and there is a fixed promotion rate at each stage, the overall 
average position at any given time for each gender must be 
independent of all the other factors, including any changing 
profession preferences, so it is not obvious what the changes 
really are optimizing. Figure 11 shows the average positions 
achieved by one gender as the preferences emerge when either 
an ability difference or discrimination favors Prof2 advances. 
The average positions achieved in both professions decrease 
as a result of the preferences changing away from being equal. 
Both genders gravitate towards the profession they do best at, 
increasing the competition there, and reducing the average 
position there for their gender. Those individuals remaining 
in the other profession face a larger pool of competitors of the 
better performing gender, and they are worse off on average 
too. It is the higher numbers in the best profession for each 
gender that keeps the average position constant throughout. 

Often the most important issue for the businesses concerned 
is the average abilities at each stage in the two professions, 
irrespective of the genders involved. Figure 12 shows the 
ability levels at each stage after strong preferences emerge. 
Each profession employs individuals almost exclusively of 
just one gender, so the effects of gender-based discrimination 


or ability differences are very small, and all the ability levels 
converge, except for the very few individuals who persist in 
the profession that discriminates against them. 

A final question of interest here is how does the emergence 
of profession preferences affect the society as a whole, given 
that they have been driven purely by individuals wanting to 
reach higher positions in their chosen profession. Figure 13 
compares the ability levels for each gender in their most 
appropriate profession. The best abilities overall come from 
ability-based profession choice, and the worst abilities come 
from a random profession choice. That is to be expected, 
given that random choice means many individuals will not 
end up performing according to their best potential. The not- 
so-obvious result is that emergent profession preferences are 
able to bring the ability levels close to the ability-based levels, 
particularly for the higher stages. This might be important 
for the population as a whole if the abilities are difficult to 
assess before the profession choice needs to be made. 

Conclusions and Discussion 

A general population-based framework has been proposed that 
enables the simulations of gender-based differences in various 
professions that involve ability-determined promotions up 
some form of hierarchy. The representative results presented 
were primarily chosen to demonstrate that the models do lead 
to reliable results in key simplified scenarios, though the 
approach can also be used to generate novel results for known 
real-world scenarios. The simulations presented have served 
to show how the principal factors can be studied effectively 
within the framework, and illustrated how distinct causes can 
lead to indistinguishable consequences, how preferences are 
able to emerge by natural selection, and how inappropriate 
interventions can make matters worse rather than better. 

To simplify the presentation, all the simulations reported in 
this paper have only involved two professions, and all the 
gender-based differences have been symmetric across those 
professions. In reality, of course, there are many more than 
two professions, and a distinct lack of symmetries, but the 
proposed simulation framework is general enough to cope 
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easily with such complications. The results presented in this 
paper will then serve as the baseline against which those more 
realistic simulations can be compared. 

There are clearly many other factors that could be built into 
the simulations, so hopefully this modeling approach will 
become more widely used in the future. A key issue is that 
there are too many potential gender-based effects to simulate 
all the possible combinations, but there are numerous specific 
hypotheses that could be tested empirically with the approach. 
One concerns the effect of gender differences in risk taking 
leading to differences in the variance of abilities (e.g., 
Schubert, 2006; Robison-Cox, Martell and Emrich, 2007). 
Another relates to distinct career paths to the top levels of 
some professions, with different gender effects on each (e.g., 
Robison-Cox, Martell and Emrich, 2007). It is also easy to 
fix the number of individuals at each level, rather than let it 
emerge by promoting a fixed fraction of eligible individuals 
each year, and that would lead to simulations more like those 
of the corporate management study of Robison-Cox, Martell 
and Emrich (2007) than the merit-based promotions more 
typical in academia. The consequences of other suggestions 
could also be explored, such as that women are less aggressive 
about seeking promotion, or are quicker to give up waiting for 
promotion, or more likely to leave or take time out for other 
reasons such as maternity leave, etc. All these ideas could be 
tested explicitly within the presented framework. 

There are a number of further computational complexities 
that could relatively easily be incorporated into the simulation 
approach presented in this paper, that have previously been 
tested in simulations of Life History Evolution (Bullinaria, 
2009, 2010), such as allowing abilities and preferences that 
change with time, or having parameter value distributions 
rather than parameters fixed at particular values. Hopefully, 
however, this short paper has been sufficient to demonstrate 
that the general framework proposed will allow all manner of 
additional factors to be explored in a more systematic manner 
than previously, in which the assumptions and simplifications 
are explicit, and the effects quantifiable. It is inevitable that 
some readers will disagree with the particular assumptions 
and simplifications employed in the simulations presented 
here. Hopefully progress can be made by other researchers 
using the approach to test the consequences of varying those 
assumptions and simplifications, and performing simulations 
more carefully matched to their own data and beliefs. 

References 

Altonji, J.G. and Blank, R.M. (1999). Race and gender in labor 
market. Handbook of Labor Economics, 3C, 3143-3259. 
Baron-Cohen, S. (2004). The Essential Difference. London, UK: 
Penguin. 

Benbow, C.P. (1988). Sex differences in mathematical reasoning 
ability in intellectually talented preadolescents: Their nature, 
effects, and possible causes. Behavioral And Brain Sciences, 11, 
169-183. 

Browne, K.R. (2002). Biology at Work: Rethinking Sexual Equality. 

New Brunswick, NJ: Rutgers University Press. 

Bullinaria, J.A. (2009). Lifetime learning as a factor in Life History 
Evolution. Artificial Life, 15, 389-409. 

Bullinaria, J.A. (2010). Memes in artificial life simulations of Life 
History Evolution. In: Proceedings of Artificial Life XII 


Conference (Alife XII), 823-830. Cambridge, MA: MIT Press. 

Camp, T. (1997). The incredible shrinking pipeline. Communications 
of the ACM, 40, 103-110. 

Ceci, S.J. and Williams, W.M. (2011). Understanding current causes 
of women’s underrepresentation in science. Proceedings of the 
National Academy of Sciences, 108, 3157-3162. 

Davison, H.K. and Burke, M.J. (2000). Sex discrimination in 
simulated employment contexts: A meta-analytic investigation. 
Journal of Vocational Behavior, 56, 225-248. 

Engelbrecht, A.P. (2007). Computational Intelligence: An 

Introduction. Sussex, UK: Wiley. 

Geary, D.C. (1998). Male, female: The evolution of human sex 
differences. Washington, DC: American Psychological 

Association. 

Ginther, D. and Hayes, K. (2003). Gender differences in salary and 
promotion for faculty in the humanities 1977-1995. Journal of 
Human Resources, 38, 34-73. 

Gtirer, D. and Camp, T. (2001). Investigating the incredible shrinking 
pipeline for women in computer science. Final Report NSF 
9812016. 

Halpern, D.F. (2012). Sex Differences in Cognitive Abilities (4th 
Edition). New York, NY: Psychology Press. 

Halpern D.F. et al. (2007). The science of sex differences in science 
and mathematics. Psychological Science, 8, 1-51. 

Handelsman J. et al. (2005). More women in science. Science, 309, 
1190-1191. 

Helbing, D. (2010). Quantitative Sociodynamics. Berlin, Germany: 
Springer. 

Humphreys, L.G. (1988). Sex-differences in variability may be more 
important than sex differences in means. Behavioral and Brain 
Sciences, 11, 195-196. 

Lyness, K.S. and Heilman, M.E. (2006). When fit is fundamental: 
Performance evaluations and promotions of upper-level female and 
male managers. Journal of Applied Psychology, 91, 777-785. 

Lyness, K.S. and Judiesch, M.K. (1999). Are women more likely to 
be hired or promoted into management positions? Journal of 
Vocational Behavior, 54, 158-173. 

Martell, R.F., Lane, D.M. and Emrich C. (1996). Male-female 
differences: A computer simulation. American Psychologist, 51, 
157-158. 

Mixon, F. and Trevino, L. (2005). Is there gender discrimination in 
named professorships? An econometric analysis of economics 
departments in the US South. Applied Economics, 37, 849-854. 

Moss-Racusin, C.A., Dovidio, J.F., Brescoll, Y.L., Graham, M.J. and 
Handelsman, J. (2012). Science faculty’s subtle gender biases 
favor male students. Proceedings of the National Academy of 
Sciences, 109, 16474-16479. 

Richiardi, M., Leombruni, R., Saam, N. and Sonnessa, M. (2006). A 
common protocol for agent-based social simulation. Journal of 
Artificial Societies and Social Simulation, 9, 1-15. 

Robison-Cox, J.F., Martell, R.F. and Emrich, C.G. (2007). Simulating 
gender stratification. Journal of Artificial Societies and Social 
Simulation, 10, 1-8. 

Rosenbaum, J.E. (1979). Tournament mobility: Career patterns in a 
corporation. Administrative Science Quarterly, 24, 220-241. 

Sabatier, M. (2010). Do female researchers face a glass ceiling in 
France? A hazard model of promotions. Applied Economics, 42, 
2053-2062. 

Schneider, A. (1998). Why don’t women publish as much as men? 
Chronicle of Higher Education, 45, 14-16. 

Schubert, R. (2006). Analyzing and managing risks - On the 
importance of gender differences in risk attitudes. Managerial 
Finance, 32, 706-715. 


459 



Job Insecurity in Academic Research Employment: 
An Agent-Based Model 

Eric Silverman 1 , Nic Geard 2 and Ian Wood 1 

1 Teesside University 
2 University of Melbourne 
e. silverman @ tees. ac.uk 


Abstract 

This paper presents an agent-based model of fixed-term aca¬ 
demic employment in a competitive research funding envi¬ 
ronment based on UK academia. The goal of the model is 
to investigate the effects of job insecurity on research pro¬ 
ductivity. Agents may be either established academics who 
may apply for grants, or postdoctoral researchers who are un¬ 
able to apply for grants and experience hardship when reach¬ 
ing the end of their fixed-term contracts. Model results show 
that in general adding fixed-term postdocs to the system pro¬ 
duces less total research output than adding half as many per¬ 
manent academics. An in-depth sensitivity analysis is per¬ 
formed across postdoc scenarios, and indicates that promot¬ 
ing more postdocs into permanent positions produces signifi¬ 
cant increases in research output. 

Introduction 

In recent decades the career landscape for academics has 
changed markedly. Upon graduating with a PhD, many as¬ 
piring academics enter a series of fixed-term postdoctoral 
research fellowships. Permanent positions are increasingly 
difficult to come by - in the UK, for example, only 3.5% of 
PhD graduates will succeed in getting an academic position 
(Powell, 2015). Intense competition for academic posts cou¬ 
pled with ever-increasing numbers of PhD graduates means 
that the academic workforce in the UK has shifted substan¬ 
tially toward fixed-term contracts - currently 68% of re¬ 
searchers in the UK are on fixed-term contracts (University 
and College Union, 2015). 

Much has been written about the impact insecure aca¬ 
demic working conditions can have on the individual. Ac¬ 
cording to the University and College Union report Making 
Ends Meet - The Human Costs of Casualisation in Higher 
Education some 21% of UK academics on fixed-term or 
zero-hour contracts have trouble putting food on their din¬ 
ner tables, despite many of these individuals having higher 
degrees and substantial experience. In the United States, 
adjunct professor positions - casualised positions in which 
teaching staff are paid per course or by the hour at very 
low rates, often without health insurance - now make up 
the overwhelming majority of academic positions on offer. 


Some 75% of US academics are now ‘contingent teaching 
faculty’, or adjuncts, a ten-fold increase since 1975 (Hoeller, 
2014). 

In the case of funding, evidence suggests that the current 
trajectory of academia - toward larger grants targeted at ‘re¬ 
search leaders’, which tend to bring more fixed-term post¬ 
docs with them - is not necessarily a productive one. A 
study of the projects funded by the National Sciences and 
Engineering Council of Canada found that scientific impact 
was a decreasing function of funding - bigger projects pro¬ 
duced less impact per dollar than smaller ones (Fortin and 
Currie, 2013). Similarly, a recent study of 398 project Pis in 
the UK found that while productivity - number of publica¬ 
tions - was positively correlated with funding, the relation¬ 
ship with impact factor and citation number was far weaker, 
and diminishing returns set in as funding levels rise (Cook 
et al., 2015). 

While there have been modelling studies examining com¬ 
petitive research funding systems and illuminating some of 
these shortcomings (Geard and Noble, 2010), currently we 
are unaware of any attempts to model the structure of aca¬ 
demic careers. This seems a significant oversight given the 
prevalence of stress and job dissatisfaction reported by post¬ 
docs worldwide (Van der Weijden et al., 2015). In such cir¬ 
cumstances, could the stress and uncertainty of postdoc em¬ 
ployment lead to substantial impact on research productivity 
in academia? 

Given that the majority of postdocs are hired with grant 
funding, we suggest that understanding the impact of the 
trend towards fixed-term contracts will require an exami¬ 
nation of both competitive research funding structures and 
insecure postdoctoral employment. This paper presents a 
first attempt at modelling a simple academic system which 
incorporates both of these key elements. We propose that 
modelling techniques drawn from complex systems science 
are highly appropriate for this kind of meta-research, as we 
need to explicitly represent the complex and inter-related na¬ 
ture of sector-wide constructs like research funding councils 
and academic career paths. 

Utilising previous work on modelling research funding 
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(Geard and Noble, 2010), we have constructed an agent- 
based model in which established academics compete for 
grants while postdocs compete for tenure. By examining the 
impact of these interrelated systems on overall research pro¬ 
ductivity we seek a deeper understanding of how the trend 
toward fixed-term contracts in academic employment may 
have affected the academic community. Performing a de¬ 
tailed sensitivity analysis allows us to examine the impact of 
key model parameters on overall research output. 

The Simulation Model 

The simulation model used here is based substantially on 
previous work by Geard and Noble (2010). The same code 
base was used as a starting point, and the postdoc employ¬ 
ment mechanisms were added on top of this core functional¬ 
ity. Model parameters and postdoc behaviour were inspired 
by the characteristics of UK academia. A brief summary of 
the funding model will be provided here; for further details, 
please see the original paper. 

Core Research Funding Model 

The model represents research funding as a competitive bid¬ 
ding process, in which academic agents submit proposals 
each semester in the hope of obtaining a grant. Here we as¬ 
sume that 30% of research proposals are funded, so agents 
attempt to get their research funded by investing time into 
bid preparation. The grant evaluation process in the model 
evaluates proposals based on the research quality of the sub¬ 
mitting agent and the amount of time spent preparing the bid 
- we also assume diminishing returns on time spent. 

Agents each have an individual research quality, which 
is a figure ranging between 0 and 1. Each semester, agents 
produce research output based on their research quality and 
modified according to their time allocation strategy for bid 
preparation. The final output numbers are conceptualised as 
‘units of research’, i.e., scientific publications. 

Agents who are funded benefit from an increase to their 
research quality, which is intended to represent the mate¬ 
rial benefits of research funding: increased resources, time 
bought out from teaching obligations, and so forth. How¬ 
ever, getting proposals funded requires bid preparation time, 
which in turn reduces research productivity. Agents make 
these decisions by looking at their history of successful ap¬ 
plications in the recent past and altering their time alloca¬ 
tions to attempt to optimise their success rates - the ‘Mem¬ 
ory A strategy in Geard and Noble (2010). Here we de¬ 
fined the recent past as the previous ten semesters, as shorter 
memories produced more chaotic application behaviour due 
to the volatility introduced by the regular influxes of new 
agents in the baseline and postdoc scenarios. 

Baseline Model: Simple Growing Population 

The core funding model begins with an initial population of 
100 established academics which stays static as the simula¬ 


tion runs for 100 semesters. For this paper we first developed 
a basic extension of the model which assumes that some por¬ 
tion of the disbursed grant funding is used to fund additional 
permanent academic positions. This allows us to provide a 
simple baseline case to compare with the postdoc scenarios. 

Every semester, a number of additional academics are 
added to the system equal to half the number of disbursed 
grants, rounded up. These academics are given a random 
level of individual research quality, which is reduced slightly 
for the first two semesters to represent their acclimatisation 
process as they join the ranks of tenured faculty. Otherwise 
these new academics behave identically to the established 
academics. 

Postdoc Model 

Postdocs were added to the simulation through the imple¬ 
mentation of some additional mechanisms. While the core 
funding model remains the same, as does the behaviour of 
the established academics, in the postdoc case additional 
agents with unique properties are added to the simulation 
each semester. 

When a grant is successfully funded, the agent who sub¬ 
mitted the grant receives the same beneficial increase to their 
research quality as in the baseline case. In addition, every 
grant funds a new postdoc who is then added to the simula¬ 
tion. Postdocs differ from established academics in several 
key aspects: 

1. Postdocs spend 100% of their time doing research 

2. Postdocs are unable to apply for grants 

3. Postdocs are on fixed-term contracts, ranging from 4 to 
10 semesters in length 

4. New postdocs experience the same small reduction in re¬ 
search quality as new academics in the baseline scenario 

5. At the end of their contract postdocs experience a reduc¬ 
tion in research quality for their final two semesters 

Note that the inability of postdocs to apply for grants is 
not true everywhere - postdocs in the UK are unable to ap¬ 
ply for grants, but can do so in Australia, for example. In 
some countries this varies between funding agencies, as in 
the United States. 

At the end of a contract, some postdocs will be fortunate 
enough to get promoted into a permanent position. Pro¬ 
moted postdocs are converted into established academics, 
will become able to apply for grants, and must devote some 
portion of their research time to bid preparation. The like¬ 
lihood of being promoted is determined by a parameter, the 
promotion chance , which will be examined in detail later 
in this paper. The default promotion chance is 15%, a rate 
which was thought to be reasonable after comparing the 
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Mean OutDut in Postdoc Scenarios 



Growing Pop noRQnoM noRQM RQnoM RQM 


Scenarios 

Figure 1: Mean research output per academic across five 
different scenarios. The Growing Population scenario 
does not include postdocs. 

widely variable statistics between different UK higher ed¬ 
ucation organisations. 

In the real world, promotion is not always a direct result of 
performance - timing, luck, geographical location, connec¬ 
tions, and even nepotism can play a role. In order to examine 
the role of merit-based promotion in this model, we allow 
the promotion process to be set to either take into account 
an agent’s research quality, or to promote a random selec¬ 
tion of agents. In the former case, agents are ranked by re¬ 
search quality and a percentage of the top-ranked agents cor¬ 
responding to the promotion chance will be advanced into 
permanent positions. In the latter case, a random sample of 
equivalent size is selected to be promoted. These two cases 
are referred to as RQ and noRQ scenarios in the Results sec¬ 
tion. Agents who are not promoted are removed from the 
system and may no longer contribute to research output. 

In order to model the notion that postdocs provide useful 
experience and thus increase the quality of new permanent 
academics, we included an option for a mentoring bonus for 
newly-promoted agents. In the mentoring scenario, newly- 
promoted agents gain a significant bonus to research quality 
to represent the proposed benefit of this intensive research 
experience. The mentoring and no-mentoring scenarios are 
referred to later as M and noM, respectively. In the default 
scenario, the mentoring bonus adds an additional 0.5 to an 
agent’s research quality. 

Results 

In order to examine the impact of fixed-term contracts on 
academic research productivity, we analysed the research 
output of a number of different scenarios. The first set of 
scenarios compares a variety of postdoc settings with the 
baseline growing population case. The second and third 
sets of scenarios investigate the postdoc model more deeply, 


OutDut in Postdoc Scenarios 



Growing Pop noRQnoM noRQM RQnoM RQM 

Scenarios 


Figure 2: Total research output for the entire system 
across five different scenarios. The Growing Population 
scenario does not include postdocs. 

looking at the impact of varying the promotion chances of 
postdocs and the levels of job-hunting stress they experi¬ 
ence at the end of a contract, respectively. All three of these 
sets of scenarios used default settings of the key parameters 
as follows: postdoc promotion chance at 15%; mentoring 
bonus to research quality at 0.5; new postdoc stress and end- 
of-contract stress at 0.3. The mentoring bonus was turned 
off in certain scenarios. 

Following these analyses, we recorded the results of 8,000 
runs of the simulation across a comprehensive range of pa¬ 
rameter settings and then performed a detailed sensitivity 
analysis. These techniques will be described in more detail 
in the Sensitivity Analysis subsection below. 

Scenario Set 1: Postdocs vs Permanent Academics 

The first set of scenarios compares four different postdoc 
scenarios to the baseline scenario in which the population 
consists entirely of permanent academics. These four sce¬ 
narios correspond to the four possible combinations of the 
RQ and M parameter settings described above: 


Table 1: Postdoc Scenarios - Set 1 


Scenario Name 

Settings 

noRQnoM 

Random promotions, no mentoring 

noRQM 

Random promotions, mentoring 

RQnoM 

Non-random promotions, no mentoring 

RQM 

Non-random promotions, mentoring 


Figure 1 provides a comparison of mean research output 
per academic across these five scenarios. The results are av¬ 
eraged across fifty runs of the simulation for each scenario, 
and the error bars indicate the standard deviation. The basic 
growing population scenario outperforms all four postdoc 


462 










































Promotion Chance 

Figure 3: Mean research output per academic for five dif¬ 
ferent values of the promotion chance parameter. 

scenarios on this measure; in the postdoc scenarios, men¬ 
toring appears to be the most important driver of increased 
research output. Perhaps surprisingly, promoting postdocs 
randomly or non-randomly seems to make little difference 
to the final outcome. Results suggest that the drop in re¬ 
search output in the postdoc scenarios may derive from the 
instability introduced by a constant influx of postdocs with 
unpredictable research quality; the fact that agents require 
a longer memory in the postdoc scenario in order to settle 
on stable time allocation strategies supports this interpreta¬ 
tion. Postdocs are also ineligible for grant-related research 
bonuses, which negatively affects research output levels. 

Figure 2 shows another comparison between the five sce¬ 
narios, this time for total research output across the whole 
academic system. The figures displayed here are the mean 
final research outputs averaged across fifty simulation runs 
for each scenario. Again we see that the basic growing popu¬ 
lation case outperforms every postdoc scenario, even though 
only half as many permanent academics are hired in that sce¬ 
nario. Mentoring again takes precedence in the postdoc sce¬ 
narios, as the mechanics of promotion seem to make little 
difference to the final outcome. 

Scenario Set 2: Promotion Chance 

In this second set of scenarios we compared the final outputs 
of simulation runs using five different values of the promo¬ 
tion chance parameter. This parameter sets the likelihood 
of a given postdoc being promoted into a permanent posi¬ 
tion. In this set of scenarios we used parameter settings for 
the RQM scenarios from Set 1, as these seemed to provide 
the most favourable results among the possible postdoc sce¬ 
narios. All other parameters were kept at the default values 
indicated in the model description above. 

Figure 3 shows a comparison of the mean research out¬ 
put per academic at the end of the simulation. Again each 



Figure 4: Total research output across the system for five 

different values of the promotion chance parameter. 

scenario was run fifty times and the error bars indicate the 
standard deviation. In this set of scenarios the 100% promo¬ 
tion scenario was the most successful; these runs gave us the 
highest values for mean research output out of any postdoc 
scenarios we ran. 

In Figure 4 we provide a comparison of total research out¬ 
put across the entire system using the same five values for 
promotion chance. Here we see a marked increase in over¬ 
all research productivity when 100% of postdocs are pro¬ 
moted - the mean total output is significantly more than 
double what we see at 15%. However, this clearly would 
be the most expensive option in a postdoc-employing aca¬ 
demic world - in a later subsection we will examine the cost 
issue in more detail. 

Scenario Set 3: Job-Hunting and Stress 

In the third set of scenarios, we investigate the impact of 
job insecurity on research output. As described in the in¬ 
troduction, a number of studies of postdocs and fixed-term 
academics have revealed the difficult consequences of inse¬ 
cure and low-paid academic work on individuals. In order 
to represent the potential negative impact of these stressors, 
we have implemented a small research quality penalty, set 
to 0.3 by default, which represents the impact of postdocs 
needing to spend time searching for academic jobs, many of 
which require lengthy and detailed application processes to 
be completed, and the stress caused caused by impending 
redundancy. 

For these analyses we again collected data from sets of 
50 simulation runs for five different values of the key pa¬ 
rameter, in this case the job-hunting/stress penalty applied 
to postdocs reaching the end of their contracts. We decided 
to set the upper bound for job stress at 0.7, as we felt it rea¬ 
sonable to assume that most postdoc positions, while poten¬ 
tially stressful, would likely not take up more than 70% of 


463 











































Jobhunting Stress 



Jobhunting Stress 


Figure 5: Mean research output per academic for five dif¬ 
ferent values of the job-hunting stress parameter. 

researchers’ time due to that stress. 

Figure 5 shows a clear downward trend in mean research 
output for individual academics as the level of job stress in¬ 
creases, and Figure 6 shows a near-identical result for total 
research output across the entire system. We note that in 
all of these scenarios postdocs only represent approximately 
10-15% of the total academic population at the end of a nor¬ 
mal simulation run, and yet the impact of this stress param¬ 
eter is very significant. 

This result indicates the prominent role that even this 
small population of postdocs has in the research landscape. 
Established academics must divide their time between re¬ 
search and bid preparation, and failed bids often lead re¬ 
searchers to put enormous amounts of time into the next 
round of preparations. As a consequence, established aca¬ 
demics tend to enter periods of ‘feast or famine’ in which 
they either spend all their time writing bids and failing to 
produce any research, or they succeed with several applica¬ 
tions in a row and feel safe in reducing their bid preparation 
time in order to increase their research output - which is then 
further increased by the research bonus added by the grant 
itself. 

In contrast, postdocs devote 100% of their available time 
to research, and since they cannot apply for grants they are 
not distracted from their work by the grant-funding lottery. 
During the average simulation run postdocs frequently av¬ 
erage nearly double the research output of established aca¬ 
demics, with only top-achieving grant-holders able to ex¬ 
ceed their productivity. Postdocs thus take up the slack in 
the research community while everyone else fights to win 
grants. As a result, postdocs account for a significant frac¬ 
tion of the overall research output, and thus reductions to 
their productivity have a strong impact on the overall re¬ 
search output of the academic population. 


Figure 6: Total research output across the system for five 

different values of the job-hunting stress parameter. 

Sensitivity Analysis 

Despite the relative simplicity of the agent behaviours in this 
model, the system does incorporate a number of elements 
which may interact in unexpected ways. In order to further 
understand the dynamics of the model we looked to uncer¬ 
tainty quantification methods, which can allow us to delve 
deeper into the effects of each model parameter on research 
outcomes. 

Our chosen method was inspired by a previous UK re¬ 
search project known as Managing Uncertainty in Complex 
Models, or MUCM (http : //www. mucm. ac . uk/). The 
MUCM team developed some specialised software specifi¬ 
cally for use in the analysis of complex computational mod¬ 
els. One of these pieces of software, GEM-SA, implements 
a Gaussian process emulator, which allows us to perform an 
in-depth sensitivity analysis of complex computational mod¬ 
els with multiple input parameters (O’Hagan, 2006), includ¬ 
ing agent-based models (Silverman et al., 2013). 

Detailing the construction of Gaussian process emulators 
is beyond the scope of the current paper, so we recommend 
reading Kennedy and O’Hagan (2001) for further details. 
To summarise briefly, Gaussian process emulators provide a 
measure of the influence of each individual input parameter 
on the total output variance of the simulation. The emula¬ 
tor works on the assumption that the single output variable 
specified - total research output at the end of the simula¬ 
tion, in this case - can be understood as a composition of a 
series of main effects driven by the input parameters, inter¬ 
action effects for all combinations of those parameters, and 
a constant term. In the current implementation, additional 
uncertainty introduced by the computer code itself is also 
taken into account. In essence, the emulator builds a statisti¬ 
cal model of the computer model based on an input training 
set. 

For this sensitivity analysis we chose four key input pa- 
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Figure 7: Results of Gaussian Process Emulator demonstrating the impact of four input parameters on final research output 
values for the whole system. The emulator was run with 400 different parameter combinations; each combination was run 20 
times and the outputs averaged. Source: GEM-SA software (own calculations). 


rameters - postdoc promotion chance, mentoring bonus for 
just-promoted postdocs, and the stress caused by entering a 
new position and by leaving a position. The final output of 
interest was the total research output across the system at the 
end state of the simulation. 

GEM-SA requires a large training set in order to produce 
good results, so we generated a set of 400 possible param¬ 
eter combinations for these four inputs - the maximum al¬ 
lowable in the GEM-SA software. Promotion chance values 
ranged between 0.15 - 1.0, mentoring bonus between 0.3 - 
0.7, and job stress for both entering and leaving positions 
between 0.1 - 0.7. We then ran each one of those 400 set¬ 
tings 20 times, resulting in 8,000 total simulation runs, and 
took the mean of the total research output for each setting, 
then passed those results to GEM-SA. 

Table 2 provides a summary of the GEM-SA output af¬ 
ter 41,000 runs of the emulator. We can clearly see that 
the single largest driver of research output in these postdoc 
scenarios is the likelihood of postdoc promotion, which ac¬ 
counts for 86.43% of the final output variance. The mentor¬ 
ing bonus provided to newly-minted academics is the second 
largest contributor, accounting for 8.87% of the output vari¬ 
ance. Job-hunting stress at the end of a contract plays a small 
role in the final results, but interestingly stress due to enter¬ 
ing a new position is largely inconsequential - this could be 
due to the tendency for new academics to struggle to achieve 
consistent outcomes regardless of their stress levels, at least 
until they settle into a more stable pattern of bid preparation. 

In Figure 7 we provide the graphs generated by the GEM- 


Table 2: Effect on output variance from input parameters 


Parameter 

Variance (%) 

Promotion Chance 

86.43 

Mentoring Bonus 

8.87 

New Postdoc Stress 

0.08 

Job-Hunting Stress 

2.57 

Promotion x Mentoring 

1.31 

Promotion x New Stress 

0.01 

Promotion x Job-Hunt Stress 

0.69 

Other Interactions 

0.02 


SA software, which show the effects of each input parame¬ 
ter on total research output. The graph demonstrates that as 
the postdoc promotion chance increases, the total research 
output increases as well - and once again the effect on the 
final output is significant despite the relatively small size of 
the postdoc population. Similarly, the size of the mentoring 
bonus applied to newly-promoted postdocs has a positive 
impact on total research output, although significantly less 
pronounced than the effect of promotion chance. Both start¬ 
ing a new job and reaching the end of a contract appear to 
impact negatively on research output, though the influence 
from the latter is somewhat variable. This makes intuitive 
sense, given that the amount of postdocs eligible for redun¬ 
dancy in each simulation will vary significantly depending 
on random factors in the simulation, in contrast to new-job 
stress which every postdoc is guaranteed to experience. 
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Return on Investment 


Discussion 


While the results to this point reinforce the interpretation 
that higher rates of postdoc promotion lead to greater re¬ 
search output, in the real world this would have substantial 
cost implications. Postdocs are welcomed by universities as 
employees given that their salaries are paid for by external 
funding in most cases - taking those employees on as per¬ 
manent academics requires a significant investment from the 
university’s point of view. 

In order to better judge the cost-effectiveness of these 
scenarios, we implemented a very simple return on invest¬ 
ment (ROI) calculation as a rough indicator of relative per¬ 
formance between scenarios. The simulation compares its 
total research output in a given time step to a funding-free 
scenario in which we calculate the research output of the 
current agent population if they were able to spend 100% 
of their time doing research only. Research outputs linked 
to funding - grant-related research quality bonuses and all 
postdoc research output - are removed. The ROI is then de¬ 
fined as the difference between the funded research output 
and the funding-free output, divided by the amount of fund¬ 
ing disbursed. This gives us a measure of the amount of 
additional research purchased with each unit of funding. 

Figure 8 shows a comparison of ROI for five promotion 
chance scenarios. Note that all results are in the negative - 
in other tests we also found that postdoc scenarios produced 
less research despite the increase in investment compared to 
the base case. Perhaps surprisingly, ROI becomes less poor 
in the higher promotion chance scenarios - so despite the 
additional cost, in a world with postdocs promoting more 
of them seems to produce dividends in terms of increased 
research output for the money spent. 



Figure 8: Results of ROI calculations for five postdoc 
promotion scenarios. 


While the core functionality of this simulation is relatively 
simple, understanding the complex agent behaviour and its 
consequences requires in-depth analysis. The multiple sets 
of scenarios presented here are intended to provide a rela¬ 
tively complete picture of the simulation outcomes across a 
range of parameter settings, and to give a comparison be¬ 
tween the postdoc and non-postdoc scenarios. 

In Scenario Set 1 we compared the postdoc scenarios with 
a growing population in which half as many permanent aca¬ 
demics were hired. Notably, in every case the non-postdoc 
scenarios produced higher individual research productivity 
and higher total productivity. In Scenario Set 2 and 3 we 
examined two unique properties of the postdoc agents: their 
chance for promotion, and the job-hunting stress they feel 
toward the end of their contracts. We found that higher pro¬ 
motion chances lead to significantly higher research output, 
both individually and for the whole population. Unsurpris¬ 
ingly we found that higher stress leads to lower output - but 
the effects were surprisingly strong given the small size of 
the postdoc populations. The sensitivity analysis reinforces 
the results of Scenario Set 2, showing that postdoc promo¬ 
tion chance is driving the majority of the output variance in 
the postdoc scenarios. 

These results lead us to conclude that in this simple model 
of postdoc careers in a competitive funding environment, the 
career path of postdocs has a significant impact on research 
productivity across the academic system. Postdocs end up 
accounting for a large fraction of the overall research output 
in the simulation while established academics get caught up 
in competing for grant funding, so the impact of job-related 
stress and poor mentoring is also felt across the population. 

Unfortunately this does not bode well for real-world 
academia, as studies repeatedly confirm the poor mentor¬ 
ing and career development offered to postdocs around the 
world (Felisberti and Sear, 2014; Akerlind, 2005). Post¬ 
docs regularly report significant anxiety about their career 
prospects, problems making ends meet financially, and a 
lack of career guidance and institutional support. The simu¬ 
lation shows that leaving postdocs to shoulder these burdens 
unsupported may have unexpectedly severe impact on our 
research productivity. 

There are indications that this careers guidance aspect is 
being taken more seriously. In the UK institutions have 
signed up in large numbers to the Concordat to Support the 
Development of Researchers (Vitae, 2008), an agreement 
which calls on institutions and funders to develop strong 
frameworks for researchers’ career development. However, 
this simulation suggests that offering supportive work en¬ 
vironments and career advice may not be sufficient - max¬ 
imising the sector’s research potential would involve a more 
substantive rethink of the current state of academic careers 
and funding. 

This simulation is only an abstract representation of the 
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funding and careers situation in academia, and should not 
be taken as a recipe for policy at this early stage. However, 
these results do give us a sense of the dynamics at work 
between competitive funding systems and postdoctoral re¬ 
searchers. In a system which highly incentivises senior aca¬ 
demics to spend significant time on grant applications, post¬ 
docs are intended to fill the research gaps - but when those 
same postdocs’ careers are tied to insecure short-term fund¬ 
ing, research funders end up actually getting fewer outputs 
for their money. 

Future work will need to examine these systems in more 
detail. At the moment, grants are represented very simply: 
one grant is much like another; and any single grant only at¬ 
tracts one postdoc. Further differentiation between types of 
research funding may help us develop some new methods of 
research funding disbursement that could alleviate some of 
these issues. Future versions of the model would also benefit 
from additional detail on the postdoc life course - postdocs 
in this simulation only have one contract and one attempt to 
achieve promotion, whereas in the real world postdocs often 
work on a succession of fixed-term contracts. 

Similarly, the treatment and experience of postdocs varies 
significantly between countries, disciplines, and even be¬ 
tween individual institutions - this model is based on the 
postdoc situation at research-intensive universities in the 
UK. In the real world postdocs may face a wide variety of 
obstacles depending on where they may be employed, which 
could substantially change their coping strategies. Under¬ 
standing the lived experience of postdocs through quantita¬ 
tive and qualitative studies and incorporating this data into a 
more sophisticated decision-making model would allow for 
a more detailed representation of the varied postdoc career 
landscape. 

While these results look dire for the postdoc scenarios, we 
do not believe this is due to unjust assumptions on our part. 
This model is relatively optimistic: more postdocs get pro¬ 
moted in the simulation than in many real-world academic 
systems; ROI calculations do not include costs like redun¬ 
dancy payments or training costs for new postdocs; and 30% 
of all grants are funded regardless of the number of appli¬ 
cants. Even in this relatively positive environment, postdoc 
scenarios still underperform compared to non-postdoc sce¬ 
narios, and our return on investment is quite poor. While 
we reiterate that the model is too early to serve as a driver 
for substantive policy, we suggest that it provides food for 
thought when the academic community wishes to evaluate 
its performance, both as researchers and as employers. 
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Abstract 

The Axelrod model of cultural dissemination is a convenient 
analogue to the description of archaeological cultures based 
on a series of material features, such as styles of pottery, agri¬ 
culture, domestication, etc. Allowing a population to spread 
into uninhabited, or sparsely inhabited, territory, while under¬ 
going cultural interaction, generates a ‘wave front’ containing 
larger homogeneous cultures, with a backwater of diversity. 

A very similar process is observed in the neolithic transition 
- the arrival of the first farming technology at the end of the 
Mesolithic - in south-eastern Europe (c. 8000-6000 cBC), 
where the first observable neolithic cultures are large and ho¬ 
mogeneous, and these are succeeded by greater diversity. The 
model presented here demonstrates how the dynamics of a 
spreading wave can explain the observed progression from 
large, spreading cultures to smaller, more diverse cultures. 

Introduction 

Archaeological cultures are commonly identified based on a 
set of material features common to sites belonging to that 
culture. Such features can include the types or styles of 
pottery, stone tools, agriculture, or animal husbandry used, 
among others (Clarke, 1968). For sites related to the ne¬ 
olithic transition in South-Eastern Europe, analysis of this 
material culture shows a progression from large, homoge¬ 
neous and distinct cultures present initially (Figure la), to 
smaller, more diverse and inter-related cultures 3-500 years 
later (Figure lb). 

Data such as that for the latter sites can be difficult to 
interpret, as it does not allow cultures to be easily distin¬ 
guished based on any one, or even several, material features. 
There is both a greater overlap between the features of spa¬ 
tially distant sites, and a greater variety in the features of 
nearby sites, when compared with data from the earlier sites. 

As demonstrated by Bocquet-Appel et al. (2009), the 
neolithic transition is accompanied by a ‘wave front’ of 
increased population, that spreads gradually into Europe. 
Modelling this spread of a population into a previously un¬ 
occupied, or sparsely occupied, region suggests that the ap¬ 
parent progression from homogeneous to diverse culture is 
a natural consequence of the spreading process, rather than 
being an artefact of noisier data. 


Theory 

Assuming a culturally heterogeneous initial population that 
is bordered, in some or all directions, by unoccupied area, an 
expanding front will form along any boundary which is not 
enclosed. Whereas cultures in a saturated area may spread 
by interaction with neighbouring cultures, those on the front 
have access to unoccupied neighbouring sites, and are able 
to replicate themselves into that area without modification, 
allowing them to expand much more quickly than if they 
were enclosed. 

Over a longer time period, the greater diversity behind the 
front also spreads, through the action of interaction rather 
than replication, and may erode the original front cultures. 
These diverse cultures both compete and mix with each 
other, forming an intermixed assortment of cultures: some 
growing larger; some being eliminated; and others dividing 
into partly related subcultures. 

If the rate of spreading/replication is sufficiently high 
compared to the rate of cultural interaction, then large ar¬ 
eas of homogeneous culture can form in the spreading front, 
faster than they are eroded by cultural interaction with dis¬ 
similar cultures behind them. 

Unlike replication, this cultural interaction is not directed 
by the availability of space, and spreads much more slowly 
into the area occupied by the front. When this diversity 
does reach an area previously outside the initial area, possi¬ 
bly long after the front population, it will consist of several 
small-medium sized cultures, larger the further they’ve had 
to travel from the initial area. 

The arrival of this chaos is visible in the CA results (Fig¬ 
ure lb) as both a greater diversity within closely spaced sites, 
and less distinction between distantly related sites - which 
may previously have been located in different culturally ho¬ 
mogeneous streams. 

Model 

The cultural dissemination model of Axelrod (1997) pro¬ 
vides a good analogue to the description of archaeological 
cultures based on assemblages of material features. This is 
adapted to the scenario of a spreading population by includ- 
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Figure 1: Correspondence analysis of animal species’ distribution in zooarchaeological assemblages for (a) early (c. 6000-5400 
BC) and (b) late (c. 5400-4500 BC) Neolithic. Points are colored by geographic region (c), not ah sites are present in both plots 
(a) and (a). 


(a) Large homogenous culture on wave front, after 90,000 steps 



(b) Diverse cultures erode territory of original front, after 240,000 
steps 



Figure 2: One realisation of the modified Axelrod model 
showing the spread of two large homogeneous wave front 
cultures (a), followed by their erosion by more diverse cul¬ 
tures (b). Line thickness indicates level of cultural dissimi¬ 
larity, unshaded areas are unoccupied. 

ing initially unpopulated loci, into which a culture can clone 
itself without modification, in a demographic step which oc¬ 
curs once for every N typical interaction steps. 

A cultural interaction consists of comparing two ran¬ 
domly selected neighbouring sites, and with probability 
equal to their cultural similarity - the proportion of features 
for which they are equal - setting one of the dissimilar fea¬ 
tures in one site to the value in the other. A demographic step 
consists of randomly selecting two neighbouring sites, and 
if one is occupied and the other is not, cloning the culture of 
the occupied site into the unoccupied site. This simple ad¬ 
dition results in the formation of the homogeneous front de¬ 
scribed above, along with a more diverse ‘backwater’, which 
gradually encroaches on the area first occupied by the front. 
Simultaneously, the cultures in this backwater are mixed, re¬ 
sulting in some cultures growing geographically to the ex¬ 


clusion of others. 

Importantly, this model simulates the behaviour of cul¬ 
ture, not people. Sites are either occupied or not, and need 
to be occupied to have culture. The spread of one culture 
into another does not imply any change in the population 
there, only a change in their culture. 

The model is also neutral - no trait or feature is selected 
over any other - all interactions happen at random, as in the 
original Axelrod model. The only fitness differential arises 
from the ability of sites with unoccupied neighbours to clone 
themselves. 

Conclusion 

The ability for cultures on a spreading front to replicate 
themselves without mixing or modification, followed by the 
erosion of this front by more diverse culture over time, may 
offer the first process based explanation of the observed ten¬ 
dency from initial homogeneity to later diversity, common 
to many spreading phenomenon in archaeology. 
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Indirect Reciprocity (IR) is possibly the most elaborated 
and cognitively demanding mechanism of cooperation dis¬ 
covered so far. It involves status and reputations and has 
been heralded as providing the biological basis of our moral¬ 
ity (Nowak and Sigmund, 2005). Whereas under direct reci¬ 
procity one expects to receive help from someone we have 
helped before, under IR one expects a return, not from some¬ 
one we helped, but from someone else; in this sense, helping 
the ’’right” individuals may contribute to a reputation up¬ 
lift that increases the chance of being helped, by someone 
else, at a later stage. This reputation shift depends on the 
socially adopted norm that defines what actions (and under 
which contexts) are reckoned as good or bad. Most theoret¬ 
ical models employed to date have studied how IR can lead 
to the emergence and sustainability of cooperation in infi¬ 
nite populations (Ohtsuki and Iwasa, 2006; Nowak and Sig¬ 
mund, 2005). However, it is known that cooperation, norms, 
reciprocity and the art of managing reputations, are features 
that date back to primitive, small-scale societies While dif¬ 
ferent features characterize a primitive society, here we take 
into consideration three of the most important: The evidence 
that interactions occur within small tribes, the central role 
played by reputations and the ease with which reputation 
information spreads within the tribe. In small populations, 
stochastic finite size effects are not only important, but may 
even render infinite populations analyses misleading (Imhof 
et al., 2005). Thus, it remains an open question which norms 
prevail in small-scale societies and their influence in the evo¬ 
lutionary dynamics of IR. 

With the current extended abstract, we would like to sum¬ 
marize a new analysis of this problem. In Santos et al. (2016) 
we show that population size strongly influences the merits 
of each social norm, while proposing a new formal tool to as¬ 
sess the evolutionary dynamics of reputation-based systems 
in finite populations. We investigate to which extent norms 
found to promote cooperation in large populations will re¬ 
main effective in small societies, and also to which extent the 
capacity of a social norm to foster cooperation depends on 
the community size. We consider a population of individu¬ 
als who randomly interact in pairs through a donation game, 


where one player is a potential provider of help (donor) to 
the other (recipient). The donor may cooperate and help the 
recipient at a cost c to herself/himself, conferring a benefit b 
to the recipient (with b > c); otherwise no one pays any costs 
nor distributes any benefits. Reputations are public and at¬ 
tributed by a bystander who witnesses a pairwise interaction. 
We adopt a world of binary reputations, Good (G) and Bad 
(B), which in our case are mere labels with no a-priori mean¬ 
ing. Their significance emerges in association with individ¬ 
ual behavior in connection with the donation game. This bi¬ 
nary reputation scheme, despite its formal simplicity, allows 
to consider a plethora of moral rules with variable complex¬ 
ity and it is specially amenable to a systematic mathemati¬ 
cal treatment, in the framework of population dynamics. To 
perform an evaluation, the bystander uses a social norm, that 
is, a rule that converts the combined information stemming 
from the action of the donor and the reputation of the recipi¬ 
ent into a new reputation for the donor. Social norms encod¬ 
ing this type of information are classified as 2nd-order norms 
(Ohtsuki and Iwasa, 2006). Lour of these social norms have 
been given special attention (see matrices on Lig. 1): Stem- 
judging (SJ, also known as Kandori), which assigns a good 
reputation to a donor that helps a good recipient or refuses 
help to a bad one, assigning a bad reputation in the other 
cases (Pacheco et al., 2006); Simple-Standing (SS), similar 
to SJ, but more ’’benevolent” by assigning a good reputa¬ 
tion to any donor that cooperates; Shunning (SH), similar 
to SJ but less ’’benevolent”, by assigning a bad reputation 
to any donor that defects; and Image Score (IS, actually a 
first order norm) where all that matters is the action of the 
donor, who acquires a good reputation if playing C and a 
bad reputation if playing D (Nowak and Sigmund, 2005). 
In the space of 2nd-order norms that we consider, a duple 
p suffices to unambiguously define a strategy, by specify¬ 
ing the action directed at a G or B recipient. This leads 
to the following 4 possible strategies: unconditional Defec¬ 
tion (A11D, p = (77, 77)), unconditional Cooperation (A11C, 
p = (C,C)), Discriminator strategy (Disc, p = (67, 77)), 
that is, cooperate with those in good reputation, and defect 
otherwise), and paradoxical Discriminator strategy (pDisc, 
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Figure 1: Stern-judging (SJ) is able to foster the highest rates of cooperation, independently of the (finite and small) population 
size; Differently, the efficiency of SS in fostering cooperation strongly depends on the population size and on the error rate 
committed by individuals (Santos et al., 2016). SH harms cooperation by being too strict compared to SJ due to the abusive 
widespread assignment of bad labels. See main text and Santos et al. (2016) for details on the strategic dynamics induced by 
each social norm. The matrices illustrate the 4 dominant social norms in terms of the new reputation (B/G, inside each square) 
attributed to a donor given its action (C/D, rows) and the reputation (B/G, columns) of the recipient. 


p = (79, C), the opposite of Disc). Unlike previous stud¬ 
ies, in Santos et al. (2016) we investigate the evolutionary 
dynamics of these 4 strategies within finite populations by 
means of a stochastic birth-death process, both analytically 
and through large-scale computer simulations (for details, 
please see Santos et al. (2016)). As detailed in Fig. 1, we 
show that SJ clearly stands out for small population sizes, 
dominating with SS for large population sizes. Indeed, it 
can be shown that only SJ and SS are able to combine a high 
prevalence of an ALL-Disc configuration with the incidence 
of Good reputations in this configuration, efficiently foster¬ 
ing high levels of cooperation (Fig. 1). Yet, in small-scale 
societies, SJ significantly promotes more cooperation than 
SS, as the latter fails to prevent the invasion of unconditional 
defectors (A11D) in small populations. On the other hand, 
SJ fosters an ideal coordination between strategy and pre¬ 
vailing reputations, assuring the stability of configurations 
where individuals cooperate in the donation game. Indeed, 
the high degree of symmetry of SJ allows the promotion of 
cooperation, irrespectively of the emerging meaning of G 
and B labels, also allowing ’’paradoxical” strategies to pre¬ 
vail and promote cooperation. These results remain valid for 
a wide interval of reputation assignment time-scales, errors 
of execution and reputation-assignment inaccuracies. 

To conclude, a single social norm (SJ) emerges as the 
leading norm in small-scale societies. That simple norm 
dictates that only whoever cooperates with good individu¬ 
als, and defects against bad ones, deserves a good reputa¬ 
tion. Remarkably this pattern is consistent with recent em¬ 
pirical results (Hamlin et al., 2011) showing that toddlers 
positively evaluate i) those who treat others prosocially, ii) 
those who behave negatively towards those who have acted 
antisocially, and iii) puppets that harm antisocial puppets. 
This said, behavioral experiments in this context remain a 


vast open territory and very active area of research. Finally, 
our modeling framework has the advantage of being natu¬ 
rally extendable to social norms of higher order, enlarging 
the complexity of the norms studied to date. Work along 
these lines is in progress, with promising preliminary results 
of interest within the area of evolution of biological com¬ 
plexity, and the ALife community in general. 
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Abstract 

Cooperation in scale-free networks has proven to be very robust 
against removal of randomly selected nodes {error) but highly 
sensitive to removal of the most connected nodes {attack). In this 
paper we analyze two comparable types of node removal in 
which the removal selection is based on tournaments where the 
fittest {raids) or the least fit {battles) nodes are chosen. We 
associate the two removals to two types of Maya warfare 
offences during the Classic period. During this period of at least 
500 years, political leaders were able to sustain social order in 
spite of attack-like offences to their social networks. We present 
a computational model with a population fluctuation mechanism 
that operates under an evolutionary game theoretic approach 
using the Prisoner's Dilemma as a metaphor of cooperation. We 
find that paradoxically battles are able to uphold cooperation 
under moderate levels of raids , although raids do have a strong 
impact on the network structure. We infer that cooperation does 
not depend as much on the structure as it does on the underlying 
mechanism that allows the network to readjust. We relate the 
results to the Maya Classic period, concluding that Mayan 
warfare by itself cannot entirely explain the Maya political 
collapse without appealing to other factors that increased the 
pressures against cooperation. 

Introduction 

An intriguing peculiarity of the Mayan warfare during the 
Classic period was the corporeal involvement of the elites, 
especially because their most relevant members, i.e. nobility , 
often became direct targets of offences in the form of raids 
(Webster, 2000). This raises the question what impact nobility 
losses would have had on government, and how the political 
class of the elite organized itself in order to keep the necessary 
cooperation required to sustain social order. The types of elite 
casualties rendered by Mayan warfare resemble scenarios that 
have been studied in the literature of social networks analysis, 
specifically when two types of node removal, called error and 
attack , are analyzed in scale-free {sf) networks (Albert et al., 
2000), i.e. networks in which distribution of the number of 
connections of each node follows a power law; therefore they 
are considered highly heterogeneous networks. On one hand, 
the random removal of nodes {error) can be equated to 
casualties of the general population, including lower ranking 
elites that fought as warriors. On the other hand, removal of the 


most connected nodes {attack) can be likened to specific 
nobility-targeting raids, in which the most central members of 
the social hierarchy were the main victims. Previous studies 
revealed that although the sf structure and cooperation are very 
robust to error , they are highly vulnerable to attack (Callaway 
et al., 2000; Cohen et al., 2000, 2001; Perc, 2009). At the same 
time, some research has suggested that there is a strong relation 
between network structure and cooperation, and, in particular, 
that the heterogeneity of a network drives cooperation 
(Ichinose et al., 2013; Santos et al., 2012). 

In this context, we find the Classic Maya period puzzling 
because it constitutes a counterexample: a warfare involving 
attack- like victims such as city rulers (Martin and Grube, 
2008), and yet exhibited a relatively stable social and political 
organization that maintained complex levels of social order for 
at least 500 years until widespread political collapse around 
800AD (Webster, 2002). We also know that there was a drift 
towards a less hierarchical political structure (i.e. less 
heterogeneous) during the Classic period (Jackson, 2013), and 
this contradicts previous findings which suggest that the loss of 
network heterogeneity should have brought about a decline in 
cooperation, but didn’t, at least for the indicated time span. 

We attempt to clarify these contradictions by adapting a 
computational model from (Miller and Knowles, 2015 a, 
2015b) and including similar mechanisms to error and attack, 
these mechanisms (called battles and raids respectively) differ 
in that they involve tournaments of a randomly selected group 
of nodes. In this tournament, a loser (for battles) or a winner 
(for raids) is selected based on its fitness score that is calculated 
according to the performance of that node playing the Prisoner’s 
Dilemma (Nowak and May, 1992). Thus, Miller and Knowles' 
model allows for fluctuation of network size (i.e. growing by 
addition of nodes up to a maximum number and then shrinking 
by means of removal of a certain number of nodes via a series 
of tournaments), which turns out to promote cooperation. 

Metaphorically, we can imagine that our model represents 
the network structure of the elite fraction of a given Maya 
polity. A random subset of these elite individuals will form 
bands of warriors, which engage in repeated battles to protect 
their city from attacks and to attack other cities. Who is most 
likely to perish in these battles depends to a large extent on 
whether the nobility is the main protagonist (and target) of 
these battles, or whether it is mainly a matter of survival of the 
fittest (i.e. death of the least fit). 
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Our results show that the nobility-targeting raids (don’t 
immediately affect cooperation in the overall network of elite 
individuals, although they change the network structure 
towards a less hierarchical topology. This result supports the 
theory that Maya society could have sustained moderate levels 
of systematic warfare that involved the elite for an indefinite 
time. A collapse can, however, be explained by a significant 
increase of warfare, such as occurred in the Petexbatun region 
(Webster, 2000), or, alternatively, by a secondary factor that 
introduced pressure on the living conditions at the end of the 
Classic period, e.g. an environmental change, economic crisis 
or land degradation. Our results are also consistent with 
archeological observations of a shift from highly centralized to 
a more distributed form of network structure that occurred in 
the Classic period (Jackson, 2013). Therefore, assuming that 
the archeological account of Maya warfare is correct, our 
model suggests that nobility-targeting may be an explanation 
of the structural changes in the political system of the Maya. 

Mayan warfare: battles and raids 

Archaeological findings suggests that the inhabitants of Mayan 
cities and in particular their elites were involved in constant 
warfare (Webster, 2000). Evidence of warfare can be found in 
the Preclassic period (2500 - 250 BC) 1 , but most of the conflict 
is recorded to have developed during the Late Preclassic (400 
BC - AD 250) and Classic periods (AD 250 - 1000) 
(O’Mansky and Demarest, 2007). In the Early Classic period 
(AD 250-600), warfare was characterized by small, sporadic 
raids, with their main objective seemingly the capture and 
subsequent sacrifice or imprisonment of nobility (highly 
important members of the elites). This practice intensified 
across the Late Classic period (AD 600-800) (O’Mansky and 
Demarest, 2007), and culminated in numerous city sackings 
and burnings during the Terminal Classic period (AD 800- 
1000) (Normak, 2007). 

The Maya elites were both part of the attacks (as warriors, 
i.e. ’’soldiers”) and their main targets (Webster, 2000). For 
example, evidence from Aguateca (Peten, Guatemala) indicates 
that in one particular war, which was carried out to eliminate 
another Maya state (AD 810), members of the elite made up the 
majority of warriors (Aoyama, 2005). Additionally, warfare- 
related art and inscriptions emphasize high-ranked individuals 
(Stuart, 1993; Van Tuerenhout, 2001), and this has led some 
archeologists to argue that wars were fought between the elites 
exclusively (Freidel, 1986). This would imply a small numbers 
of warriors, a maximum of 600 to 1000 for Tikal, which was 
one of the largest Mayan cities (Hassig, 1992), or 500 to 600 
for Copan, for which very accurate demographic estimates exist 
(Webster et al., 2000, 1992). Therefore some researches have 
argued that the war forces must have also involved commoners, 
however the direct involvement of the elites is not disputed 
(Webster, 2000). 

Additionally, the nobility members were often the main 
targets of the attacks. For example, the capture and sacrifice of 
the ruler of Copan (Honduras) by Quirigua (Guatemala) in 
AD 738, the capture and unknown fate of the ruler of Tikal 
(Peten) by Caracol (Peten) in AD 562, the capture and 
vassalage of the ruler of Seibal (Peten) by Dos Pilas (Peten) in 
AD 735, or the capture of the ruler of Naranjo (Peten) by Tikal 

1 All the time periods are based on (Webster, 2000) 


in AD 744 (Martin and Grube, 2008). These examples are 
clearly some of the most important since they were direct 
attacks to the main ruler, however the capture and sacrifice of 
enemies was a common practice as it also has been associated 
to status rivalry, i.e. competitive behavior exhibited by elite 
members to increase their status (O’Mansky and Demarest, 
2007): they fought to assert their roles in society and their areas 
of influence (for example, in their roles in the royal courts) by 
means of war merits. We can safely assume that the higher the 
captives’ rank, the higher the merit. 

The dual role of individuals as warriors and elite members 
(or work force) holds true as no evidence of standing armies 
has been found (Van Tuerenhout, 2001). Beyond taking part in 
warfare, residents of Maya cities must have had other 
responsibilities including the elite political roles of the nobility; 
the loss of these individuals due to warfare would then imply 
changes in the structure of the elite social network. 

We stress a distinction between two warfare scenarios: (1) 
raids with the goal to capture (and often sacrifice) nobility 
members, and (2) relatively large-scale battles between sites 
(although in reality they are not mutually exclusive). The two 
scenarios would have resulted in different outcomes: in the first 
scenario, no matter whether attackers or defenders emerged 
victorious, it would result primarily in nobility victims 
(presumably the attackers, a select group of skillful warriors, 
were also relevant members of the elites, considering the 
association of status and warfare recognition). In the second 
scenario, a high number of elite members probably died in 
combat {casualties). However, in this case, we argue that most 
of the victims were less important members of the elite since 
the nobility, if participating, should have enjoyed some extra 
protection during combat, e.g. they probably would not have 
been fighting in the frontlines. 

With the present research, we investigate the consequences 
of the two described types of warfare, in particular the effects 
of elimination of either nobility, or of less influential members 
of the elite, on the structure and functioning of elite society. 
First, we hypothesize that the removal of nobility would have 
impacted both the elite's social structure, as well as the ability 
of the government to exert its function (as measured by the 
extent of cooperation), whereas the removal of less influential 
members would be less disruptive. 

Second, we make a more specific prediction related to how 
the network structure is expected to change. In the case of the 
Maya, there was a transition towards a less hierarchical 
structure among the elites in which the ruler was gradually 
losing power during the Classic Period (Jackson, 2013), and 
this transition coincides with the increase of Maya warfare 
across the Classic Period. We therefore hypothesize that the 
increase in nobility-targeting warfare, in the context of normal 
fluctuations in population size, facilitated the emergence of less 
centralized political structures, as represented by the increasing 
importance of the Mayan royal court. 

Related work: Error and attack on sf networks 

In order to study the Mayan warfare scenario, we are modeling 
the interaction among elite individuals with the Prisoner's 
Dilemma {PD), a widely used representation of social 
dilemmas, i.e. a situation in which the individual success 
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(expressed as reward, or fitness in evolutionary terms) calls for 
actions that harm collective wellbeing, and which therefore 
implies that the emergence of cooperation from selfish 
individuals is paradoxical (Axelrod, 1984). For this reason, the 
PD serves as a metaphor of the elites' capacity to take decisions 
that could lead their city to prosperity, as opposed to simply 
personal reward. Regardless if the elite's cooperation involves 
corruption or not, it would be impossible to keep centralized 
power to sustain social order if the members of the elite don't 
cooperate among them. 

We will also investigate whether the social network structure 
of the elites serves as a mediator for levels of cooperation. In 
order to model the Maya elites' social structure, we use scale- 
free (sf) networks, i.e. networks in which the degree (i.e. k , the 
number of connections of one node to other nodes) distribution 
of the nodes follows a power law distribution, generated by 
evolutionary preferential attachment growth, where a new node 
attaches to an old node according to its fitness based on the 
outcomes of several rounds of the PD , one round per neighbor 
(Poncela et al., 2008). Due to their heterogeneous structure, sf 
networks have proven to be suitable models of other archaeo- 
logically inferred social networks (Brughmans, 2012). The 
small size of the Maya elite, estimated at 1% or 2% (Adams 
and Smith, 1977) implies a high concentration of power in a 
few nodes which is also consistent with the node's degree 
distribution of sf networks and the rich get richer nature of 
preferential attachment. This will serve as the starting point in 
our simulations, after which we will perform systematic 
attrition of the two different types of elite members (where (1) 
fittest nodes represent influential elite members, i.e. nobility, 
and (2) less fit nodes represent less influential members of the 
elites). We will then analyze the effects of this attrition on 
cooperation within the social network, and on its structure. 

Application of the proposed methodology extends existing 
research on the structure tolerance of sf networks to error and 
attack , and its relation to cooperation. In terms of tolerance, the 
structure of sf networks has formally proven to be resilient 
against random removal of nodes, i.e. error, however, it was 
sensitive to removal of the most connected nodes, i.e. attack 
(Callaway et al., 2000; Cohen et al., 2000, 2001). In terms of 
cooperation, sf networks have been shown to promote 
cooperation (Santos and Pacheco, 2005) and it is robust to 
error , but it quickly decreases under attack , and therefore the 
decrement has been linked to a decline in the network 
heterogeneity (Perc, 2009); although the link is less strong in 
dynamic networks (Ichinose et al., 2013; Poncela et al., 2008). 

Previous simulations of these processes are concentrated on 
the use of preferential attachment, where a new node attaches 
to an old node according to its degree only (Barabasi and 
Albert, 1999). Diverging from this, we will instead investigate 
evolutionary preferential attachment (Poncela et al., 2008) as it 
includes the nodes' performance (fitness) playing PD to decide 
the attachment of new nodes. Moreover, although cooperation 
has shown to be more robust in dynamic networks, to our 
knowledge none of the previous studies on attack have focused 
on an underlying mechanism of growth and shrinkage of the 
network based on the node fitness, such as the fluctuating 
model of (Miller and Knowles, 2015 a, 2015b). In their model 
cooperation increased under attrition of nodes that were chosen 
by applying a probability that favored the nodes with least 
fitness (and indirectly less connections). 


In our application of their model, we would like to propose 
that the low attrition levels from Miller and Knowles (2015) 
can approximate the normal expected mortality rate among the 
elite, and that higher levels of attrition would then correspond 
to an increase of mortality due to casualties of warfare. More 
specifically, since attrition in Miller and Knowles (2015) was 
directed at the least fit, it would be representative of large-scale 
battles where mostly the less relevant members of the elite died, 
in contrast to raids conducted with the explicit aim of capturing 
or killing the nobility (most fit). 

Therefore, we first implement the Miller and Knowles model 
and replicate their results, focusing on interpreting the data 
within the Maya context. After this, we extend the model to test 
the effects of the removal of the fittest nodes (i.e., raids) when 
the fluctuation system (set at different casualty levels) is still 
present. In this case, we reverse the attrition's selection 
probability to now address the nodes with high fitness, which 
also tend to be the most connected ones since a higher fitness 
is more probable with a high number of connections. We can 
assume that in both scenarios non-elite members of society 
were also negatively affected, but this is unlikely to have 
impacted the structure of government and is not modeled 
explicitly According to the existent theory, removing a few 
highly connected nodes of the social network will have a bigger 
impact than the removal of many of the less connected ones. 
However we will show that the fluctuation in network size will 
produce an equilibrium of network structure that allows the 
persistence of cooperation. 

Methods 

Our model is based on the fluctuation models described by 
Miller and Knowles (Miller and Knowles, 2015a, 2015b) which 
comprise alternating growing and shrinking phases (i.e. battles , 
attrition of some of the least fit members) of the population. We 
have additionally included raids , a mechanism of attrition of 
some of the fittest nodes based on a tournament selection that 
is analogous to theirs except that it selects the fittest nodes; both 
attritions can operate constantly but at different rates. Similar 
to theirs, our simulations keep a population size of around 1000 
agents (1009 is the maximum) because, given that elites among 
the Maya are estimated to represent 1% to 5% of the population, 
1000 elite members would correspond to a total population of 
between 20000 and 100000, which agrees with population 
estimates for Mayan cities. 



B: Cooperate 

B: Defect 

A: Cooperate 

1 \ 1 

0 \b 

A: Defect 

b \ 0 

0\0 


Table 1: Payoff matrix for the weak Prisoner’s Dilemma. 

Column 1 shows player A's strategy, and row 1 shows player 
B' strategy. The payoff of the combination of A and B strategies 
are shown in the middle cell as A's payoff (blue) \ B's payoff 
(red), where b represents the temptation to defect. 

An edge between two agents (nodes) represents that they 
know each other, and therefore it exists the possibility of an 
interaction between them: an engagement in the weak version 
of the Prisoner's Dilemma (PD) game (following Miller and 
Knowles' model implementation), in which each agent obtains 
a payoff according to its own strategy and the strategy of its 
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rival. Table 1 shows the payoffs that agents A and B obtain 
according to the two possible strategies that they can play, i.e. 
cooperate or defect, as formulated in (Nowak and May, 1992). 
We can imagine that cooperating nodes that have the largest 
numbers of connections to other cooperators represent high- 
status nobility among the elite. 

The only parameter b represents the temptation to defect. In 
principle, the dilemma only exists when b > 1 because the 
strategy that gives the biggest payoff to one agent depends on 
what the other plays. Otherwise, for b < 1, the only rational 
solution to obtain the maximum possible payoff, regardless the 
other agent’s strategy, is to cooperate. The temptation to defect 
( b ) represents how competitive the situation is, e.g. it could 
represent the lack of water, in which case people would try to 
get as much as possible of it only for themselves before it runs 
out instead of sharing it with their group. 

Following Miller and Knowles, all the simulations start with 
one of two extreme cases of 3 agents that are either all 
cooperators (CCC) or all defectors (DDD). This enable us to 
observe the response of the model under the best and worst case 
starting conditions. An iteration (^) of the simulation consists of 
five steps: 

1. Play PD. In each edge of the network, PD is played between 
the two connected agents (neighbors of each other) 
representing an interaction between two elite members. This 
results in each agent playing against all its neighbors once, and 
accumulating a fitness score equivalent to the sum of the 
payoffs (r) obtained in all the games: 

Z kjCO 

r i,j (1) 

;=i 


2. Update Strategies. Updating of behavioral strategies is 
based on imitation of the most successful elite members, and 
the implicit rule: cooperate with cooperators (or, defect with 
defectors) if they are performing better. Each node i in the 
network randomly selects another node j from its neighbors. If 
the fitness of node i (//) is less than the fitness of the neighbor j 
( fj ), then the node i will change its strategy to the neighbor's 
according to the following probability: 


b * max(k if kj ) 


( 2 ) 


This probability is proportional to the difference between the 
nodes fitness scores; therefore agents that produced very low 
fitness compare to their selected neighbor are more likely to 
copy the neighbors' strategy. In order to obtain a probability, 
the denominator normalizes the fitness according to the 
maximum possible difference between the two nodes given 
their current degree ( k ). 

3. Grow network. In each iteration the elite will grow 
including new members, newborns, kin or outstanding/skillfull 
commoners. 10 new nodes with a randomly selected strategy 
(C or D) are connected to the network by 2 edges that are 
created according to the evolutionary preferential attachment 
mechanism (Poncela et al., 2008). An existing node i will be 
connected through one of the two edged to the new node 
according to the following probability: 


nm = 


1 - £ + sfi (t) 

iJfU - £ + £/(0) 


( 3 ) 


Here, N(t) is the number of nodes available to connect, not 
including neither any of the 10 new nodes that are being added 
in this step nor any existing node already connected to the new 
nodes (i.e. without replacement), and s E [0,1 [ is a parameter 
that adjusts the selection pressure, i.e. the lower the selection 
pressure s the more probable is that a non-well-fit node will get 
a connection to a new node. In all our simulations we have set 
a high selection pressure of s = 0.99, favoring the evolutionary 
preferential attachment process. 

4a. Battles . We changed the name attrition (used by Miller and 
Knowles) to battles to easily distinguish it from the attrition of 
relevant nodes, i.e. raids (Step 4b). If the population reaches a 
size bigger than a specific value (1000 in our simulation), then 
the network is shrunk by C% of nodes. Each of these nodes was 
the loser, i.e. the member with the least fitness, of a tournament 
of S participants in which the payoffs were compared. The 
participants were a randomly chosen 1% of the population, S = 
1% x N(t). In case of ties, the loser is selected randomly from 
among the ones that tied. The tournament is performed as many 
times as necessary to have a group of losers that is equivalent 
to the C% of the population. Then they are removed from the 
network together with their edges. Any disconnected nodes 
resulting from this process are also removed. We note that 
removals caused by battles resemble casualties (Q, or 
generally speaking mortality, during warfare in which the elite 
were involved, in which the least fit members were more likely 
to die. We also point out that when the tournament involves one 
participant (S = 1), battles are equivalent to error (random 
removal of nodes). Additionally, when S > 1, battles always 
selects among the least fit nodes (in the worst case, the S-th 
fittest), whereas error does it the majority of the time as in sf 
networks the distribution of fitness, as it is for connections, is 
expected to be unbalanced, i.e. very few nodes will concentrate 
most of the reward being less likely to be selected in a randomly 
uniformed process. The tournament avoids the selection of the 
fittest nodes as raids (Step 4b) will be responsible of this 
selection. 

4b. Raids. All the previous steps (i.e. steps l.-4a.) are 
equivalent to those described in Miller and Knowles (Miller 
and Knowles, 2015a, 2015b); but raids are an extension of step 
4a that we are adding to study the impact of nobility-targeted 
raids. As step 4a, raids resemble existing game theoretic 
nomenclature, i.e. attack , except it also contains a tournament 
component. In contrast to the previous steps (l.-4a.) which are 
performed every iteration, raids are performed each T 
iterations, i.e. frequency of raids of F = 1 / T. This means that 
conflicts in which nobility are expected to die occur relatively 
less frequently compared to the number of deaths caused by 
generalized warfare. The selection mechanism for the nobility 
victims (V% of population size), is analogous to the selection 
mechanism in the battles step (Step 4a.); except that in this case 
instead of losers the winners are removed, a winner of a 
tournament is the one that has the most fitness instead of the 
least. As with battles , any disconnected nodes resulting from 
this process are also removed. 

The main response variable of this model is the percentage 
of cooperators (i.e. agents that have the 'cooperate' (C) strategy) 
in time step t. To analyze the effects of battles (Step 4a) we 
implemented our own version of this model. After getting 
statistically different results - although qualitatively similar - 
we compared Miller and Knowles' code (provided by the 
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authors) with ours, and found an important difference in the 
way the strategies were updated. Their implementation was 
updating strategies asynchronously, i.e. each agent would 
update its strategy s t with a copy of the neighbor's strategy s n as 
soon as they met the conditions of Step 2. This produces a 
situation in which an agent may transmit the updated strategy 
s n instead of the original Si, which according to our criteria 
should be the correct one, because Si is the strategy the agent 
used to obtain his current reward associated to Equation 2. 
Instead, our implementation updates the strategies 
synchronously, i.e. each agent first evaluates which should be 
its new strategy s n without changing their current strategy s t 
until every agent knows their new strategy s n for the next 
iteration. 

Maya warfare: an experimental application 

Is cooperation reduced by an increment of battles during 
warfare? Miller and Knowles results indicated that battles able 
to increase cooperation compared to the absence of battles 
when (1) the simulations started with a defector founded 
network (DDD) and (2) battles were set at a low level and the 
simulation started with cooperator founded networks (CCC). 
However, in both scenarios, there was an inverse relationship 
between battles and cooperation, i.e. the higher the casualties 
(Q in battles the less cooperation was achieved. In other words, 
a very small battles levels are able to boost cooperation but 
higher values start to negatively impact cooperation although at 
a rather slow rate. 

Their study explored values of C from 0% (no battles) to 
50% (Miller and Knowles, 2015b). In our first experiment, we 
decided to expand this to values from 0% to 90% in increments 
of 10% while keeping the same values for the temptation b (1.0, 
1.3, 1.6, 1.9, 2.2, 2.5, 2.8, and 3.1). We also include interesting 
values of 0.1%, 0.5%, 1%, 2.5% and 5% because they 
approximate realistic figures based on current average annual 
mortality in different countries, including highly violent ones 
(United Nations, 2013). Our first experiment will (1) validate 
our simulation, (2) report the new results after the correction 
procedure for synchronically updating strategies, (3) further 
confirm the inverse relation between casualties and 
cooperation, and (4) provide a comparison point for our second 
experiment. 

How cooperation is affected by the nobility-targeting raids 
in scenarios of different casualties (Q rates? Our second 
experiment includes raids (Step 4b) as part of the iteration. For 
nobility victims (V),we explore the values 0.1%, 1% and 10%, 
whereas for its frequency {F), we explore the values 1/10, 1/20, 
1/40 and 1/80. The values were selected according to 
exponential sequences for a broad exploration; for V, the 
sequence corresponds to (10 _k )^ = i, and for T (F=l/7), to 
(10x2 k )U We also explored the model with different levels of 
C ( casualties ): 0 (no battles ), 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5, 
7.5, 10, 20, 30, 40, and 50. Finally, we kept the same values of 
the first experiment for the temptation b. 

Each configuration of parameters (scenario) runs for 2000 
iterations and it is repeated 50 times in order to reduce random 
effects. The two experiments allow us to explore four different 
conditions for cooperation to emerge: (1) no attrition ( battles or 
raids), (2) just battles , (3) raids without any battles , and (4) the 
combination of raids and battles. 


Results 

The results of our first experiment were different from those 
obtained by Miller and Knowles (Fig 1, 2a) due to our 
synchronous mechanism of updating strategies (compared to 
their asynchronous mechanism), however qualitatively 
speaking the results are very similar and their conclusions hold. 
Figure 1 confirms that for scenarios that started with a group of 
defectors (figure 1A), battles strongly favor cooperation but, 
for scenarios that started with a group of cooperators (figure 
IB), only small amounts of casualties improves cooperation. 
Since the two figures (1A and IB) are very similar for 00.5, 
it seems that battles eliminate the influence of the starting state. 
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Figure 1: Average percentage of cooperators for different 
levels of casualties . X-axis displays levels of temptation to 
defect (b); Y axis displays the average percentage of 
cooperators calculated. Each line color indicate one rate of 
casualties , including 0% that serves as a baseline and it is 
highlighted with a ticker gray line. Data points are averages of 
the last 20 iterations (of 2000) for the 50 repetitions. 

At the same time, we can also observe that when we further 
increase casualties , cooperation decreases; however the rate of 
decrement is slow; levels of casualties below or equal to 30% 
(C < 30) are able to hold similar cooperation compared to no 
casualties (C = 0) with cooperator founded networks (CCC). 
For defector founded network (DDD), all levels of casualties 
proved to be better than no casualties. We did find that the 
lowest level of casualties (0.1%) appears to be insufficient to 
raise cooperation to the highest levels (lightest blue dotted line 
in figure 1A) 

The results obtained in the second experiment further proves 
the benefits of battles in terms of holding cooperation; our 
model is able to sustain cooperation when we systematically 
remove the fittest nodes of the network {raids). In figure 2, we 
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Figure 2: Cooperators for different levels of Elite Attrition. The first and second rows present graphs for scenarios without 
casualties (C = 0, top row) and with casualties (C = 2.5, bottom row), for scenarios with cooperator (left column) and defector (right 
column) founded networks. The casualties (C) for all graphs is set to 2.5%. The legend shows the different levels of nobility-victims 
(V) and its frequency (F), i.e. V% / F. The baseline is the case in which there is no raids. Data points are averages of the last 20 
iterations (of 2000) for the 50 repetitions. 


show the results obtained for cooperator and defector founded 
networks (columns), and for scenarios without casualties (C = 
0) and with casualties of 2.5% (C = 2.5). We picked this value 
arbitrarily because we found that any other values of casualties 
between 0.5% and 20% showed very similar results (data not 
shown). Casualties' levels below 0.5% (C < 0.5) are able to 
sustain cooperation but not as well as the shown in figure 2; 
whereas a steady decline of cooperation is observed for 
casualties above 20% (C > 20). 

The benefits of casualties become very evident when we 
look into defector founded networks (figures 2A and 2C); in 
fact, cooperation is boosted almost as much as if the network 
would have been founded by cooperators (with C = 2.5%) as 
results in figures 2C and 2D are hardly distinguishable between 
each other. There are also substantial benefits of casualties for 
cooperation in cooperator founded networks. When we 
compare directly the different levels of raids (individual dotted 
lines) between figure 2C and 2D we observe that cooperation 
holds much better when casualties are present, e.g. even the 
lowest rates of raids (green lines) affect cooperation in the 
scenario without casualties , whereas it takes middle rates (red 
lines) of raids when battles are present. 

In terms of network structure, we should expect some 
changes since we are trimming relevant (connected) nodes of 
the networks. In figure 3, we illustrate this changes in a 
qualitative approach based in examples that shows the internal 
behavior of the model for two interesting scenarios where the 
temptation to defect is at a safe value (b = 1.6), i.e. we observe 



Figure 3: Network structure without raids and with raids . 
Two networks obtained after 2000 generations from one run of 
the simulation (arbitrarily the 1st run out of 50 repetitions) of 
two scenarios: left, without raids, and right, with raids {E = 
0.1% and F= 10), both with, C = 2.5%, b = 1.6 and defector 
found network (DDD). The red dots represent cooperators, and 
the black defectors. The node size is proportional to the most 
connected node of both scenarios (k= 89), therefore size is 
comparable across graphs. 

a clear convergence towards cooperation. In a defector founded 
network (DDD), we compare the first repetition (out of 50) that 
was performed without nobility victims , V = 0%, (left), and the 
first repetition from the ones that were performed with V = 
0.1% and T= 10, (right). We can visually notice the structural 
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difference between them; the one without nobility victims (left) 
is more edge-dense, and contains a few highly connected nodes 
(biggest nodes) which is characteristic of the sf networks, 
whereas the one with nobility victims (right) present less edges 
and with the most-connected being almost indistinguishable 
due to its small size which is relative to the biggest node of both 
graphs. 

In order to confirm whether the graphs in figure 3 represent 
sf networks we used the Python powerlaw package (Alstott et 
al., 2014). This package is able to statistically test if a 
distribution follows a power law. For the left graph, we can 
establish a statistical difference (p = 0.028) against the 
assumption of an exponential (null hypothesis), therefore it is 
very likely that it is a sf network, and for the right graph, we are 
not able to find a difference (p = 0.958). 


V 

0% 

0.1% 

1% 

10% 

b /F 

0 

80 

40 

20 

10 

80 

40 

20 

10 

80 

40 

20 

10 

1 

85 

49 

21 

7 

7 

12 

16 

13 

8 

18 

1 

3 

88 

1.3 

85 

49 

33 

7 

3 

9 

13 

5 

0 

5 

16 

7 

90 

1.6 

80 

51 

33 

14 

1 

1 

6 

27 

50 

63 

7 

37 

91 

1.9 

94 

87 

78 

71 

80 

75 

75 

73 

15 

27 

61 

85 

97 

2.2 

88 

84 

86 

85 

80 

77 

76 

44 

64 

66 

89 

89 

93 

2.5 

85 

87 

87 

85 

78 

79 

85 

87 

87 

86 

90 

95 

89 

2.8 

96 

96 

94 

95 

89 

90 

86 

91 

95 

87 

93 

92 

82 

3.1 

96 | 99 | 98 | 96 

91 

93 

93 

89 

94 

82 

91 

85 

77 


Table 2: Total of sf networks produced by each scenario. 

The bolded cells represent parameters of each scenario 
including cooperator and defector founding populations when 
causalities (Q are set to 2.5%; the first column (starting at the 
3 rd row) shows the temptation values (b), first row the nobility 
victims (V) and second row its frequency (F), i.e. the number of 
iterations after which the network is pruned. Each of the non- 
bolded cells presents the number of networks (out of 100) that 
prove to be sf(p < 0.05). The degradation is applied according 
to the number of sf networks produced. The cells with towards 
fairer tones shows the scenarios in which less networks proved 
to be sf networks, whereas the red tones show the ones in which 
more networks proved to be sf 

In table 2 we present the amount of networks that passed this 
statistical test (p < 0.05) for each scenario in order to show that 
the networks presented in figure 3 are not isolated cases. In the 
table, we merged the results for cooperator and defector 
founded networks since they were similar between them. As 
suggested by Miller and Knowles, we confirm that their 
fluctuation model using evolutionary preferential attachment 
without nobility victims (second row) generally produces sf 
networks (above 80% for all temptation values). With a few 
exceptions, the majority of networks were unable to pass the 
power law test when raids were present and the temptation was 
below 1.9; sf networks are frequently found again for 
temptation b >= 1.9. For b > 1.9, the structural change could be 
associated to a decline of cooperation, however, for b= 1.9, we 
still have multiple cases in which cooperation still holds (for 
V=0.1% and V=l% / F e (40, 80)) and yet the structure fits that 
of a sf network. In terms of the network size, we found that 
when nobility victims (V) was set at 10%, the average size of 
the final networks was always below 905 nodes. This suggests 
that the growth phase was not fast enough to recover the 


network, but also that the raids completely isolated many nodes 
that are also removed in Step 5; in this sense, we also observed 
that in these cases there were generally multiple components. 

Discussion 

We showed that the fluctuation model presented by Miller and 
Knowles improves the cooperation robustness against removal 
of the most connected nodes {attack- like mechanism), in this 
case selected by tournament {raids). This kind of node removal 
directly targets the heterogeneity of sf networks, which has 
been argued to dominate the fate of cooperation. We 
numerically showed that this is not necessarily the case, and 
that cooperation can persist under moderate levels of raids if 
there is a mechanism that allows for the network to readjust its 
ties. Surprisingly, sf networks structures reappear again when 
cooperation starts declining. The main reason for this seems to 
be that most of the nodes have no reward (i.e. fitness) in highly- 
defector-composed networks, therefore some minimal reward 
(due to random chance) would become very advantageous to 
attract new nodes (see equation 3). Some of the new nodes will 
be cooperators (half of them aproximately) that will keep the 
initial advantage propagating to next generations. Conversely, 
when cooperation is very high, the rewards are better 
distributed among the nodes, and so are the possibilities of 
getting new connections. 

Methodologically speaking, we presented a parametrized 
attack mechanism {raids) that can be set at different rates and 
although it does not necessarily remove the top most connected 
nodes, these nodes are the most likely to be removed. Given the 
sensibility of sf networks to attack , this is a more cautious 
approach to study resilience of cooperation under removal of 
important nodes. In this sense, battles has the advantage over 
error that intentionally avoids the removal of the fittest nodes 
{raids). That said, further research should explore the presented 
model under traditional forms of attack (without the 
tournament) or even more sophisticated forms of it (Morone 
and Makse, 2015). Similarly, it is also important to evaluate 
smaller sizes (S) of battles tournament, including S = 1 
(equivalent to random removal without the tournament, i.e. 
error) because it is a more realistic representation of mortality 
in societies. The model should be extended so that the agents 
recognize specific individuals (e.g. by using a history of 
interactions with each neighbor), leading to the use of a 
particular strategy towards each neighbor instead of reacting 
uniformly depending on the fitness of a randomly chosen 
neighbor (Step 2). 

Regarding the Maya warfare, we were able to replicate 
scenarios in which cooperation persists for indefinite time in 
spite of nobility-targeting raids , which explains why the Maya 
political collapse of the AC 800 isn’t directly associated with 
these kind of attacks. This collapse could be explained if the 
raids would have increased leading up that time, which is 
consistent with evidence in the Petexbatun area (Webster, 
2000), though in this particular region the large increment of 
battles could have played a role as well. For other areas where 
we lack evidence for elevated warfare, our model favors the 
hypothesis that additional factors could have entered into play 
at the end of the Maya Classic period that increased the 
temptation to defect (b), e.g. environmental or economic crisis, 
or land degradation. 
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The model also allows us to venture the hypothesis that the 
nobility-targeting raids might have contributed to the 
emergence of a less hierarchical organization among the elites 
during the Maya Classic, thus supporting the relation between 
increased warfare and a more decentralized political hierarchy 
pointed out in the literature. We appeal to archeologists to 
verify if our results and new hypotheses are consistent with and 
helpful to explain the events of the Maya Classic. 
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Introduction People in some societies tend to put a 
greater value on equality in distribution of resources even 
if they have to pay expensive court costs to achieve it, while 
people in some other societies tend to aim at a maximal share 
(the whole) but withdraw readily if any conflict occurs. Nash 
demand game (NDG) is a one-shot two-player game and 
has been widely used for modeling such bargaining situa¬ 
tions in computational and game theoretic approaches. Each 
player simultaneously demands a portion of some good. If 
the total amount demanded by the players is less or equal 
than available good, each player obtains the claimed re¬ 
quest. Otherwise, neither player gets anything. Whereas 
the studies using NDG can account for why people favor 
the equal distribution (Skyrms (1996)), it is too simple to 
deal with various distributive norms. We use the demand- 
intensity game (DIG) which adds a psychological factor to 
NDG while maintaining such simplicity that it can be an¬ 
alyzed by the concepts and tools of the game theory (e.g., 
Kojima and Arita (2012)). 

The goal of this study is to clarify the origin and evolu¬ 
tionary dynamics of distributive norms using DIG. Previous 
studies have shown that population structures tend to pro¬ 
mote cooperative behavior by means of cooperative clus¬ 
tering and assortative interactions. We perform the evolu¬ 
tionary simulation focusing on the effect of the population 
structures on the evolution of distributive norms. We show a 
surprising result that network structures significantly change 
the evolutionary scenario. A population distributed over 
a regular network tends to evolve a strong equality norm. 
However, as the random links increase in the network, the 
more we see the scenario in which monopolists occupy the 
population who ask for the whole but with a moderate or 
timid intensity. We also find that network structures with 
some intermediate randomness create an interesting scenario 
in which several norms emerge in a cyclic manner. 

Model DIG is a one-shot game between two players. Each 
player has values of d and i (0 < d, i < 1) as a strategy 
S(d , i): S(do, io) for player 0 and S(di,ii) for player 1. d 
indicates how much portion of the resource she wants and 
i indicates the intensity of the demand as shown in Fig. 1. 


The amount of demand: d 

y Generous 0.5 Greedy j 

Unselfish Even Selfish 

The intensity of demand: / 

0 Timid 0.5 Bold 
Wimpy Moderate Belligeret 

Figure 1: Representation of strategies. 


For example, a strategy with d = 1 is described as “selfish” 
while a strategy with i < 0.5 is described as “timid.” 

Player 0’s payoff is defined as the following equations (1)- 
(3). If the aggregated demand between both do not exceed 
the total resource 1, each player obtains her demand d as 
a payoff without a conflict. Otherwise, each player obtains 
a payoff reduced by the conflict cost defined as the mean 
intensity of them, from the tentative payoff tp considered 
as follows. The self demand do is separated into two parts: 
the directly obtained part (1 — d\) and the resting overlapped 
part (do+d\ — 1). The latter will be divided at the ratio based 
on the difference between player’s intensities (1 + io — h : 
1 + i% — io). 


payoff 


do (if d 0 + di < 1), 

tp • (1 — cost) (otherwise ), 


tp — (1 — d\) + (do + d\ 

, it) + H 
cost = —-—. 


1 ) 


1 + io ~ H 


( 1 ) 

( 2 ) 

( 3 ) 


There are two typical strategies: d — 0.5 and d = 1. 
Although debatable, we simply associate the former with 
“egalitarianism” and the latter with “libertarianism.” The 
ideal society in the sense of equality and efficiency (any¬ 
one receives 0.5 in every game) can be achieved by not only 
egalitarianism norm 5(0.5, *) but also “wimpy” libertarian¬ 
ism norm 5(1,0). 

Social networks in our model are represented as Moore 
neighborhood structure on a toroidal square lattice consist¬ 
ing of 100 x 100 nodes each of which has a player. Net- 
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Figure 2: Distribution of the evolving strategies from 1000th 
to 1500th generations as a function of the rewiring probabil¬ 
ity p averaged over 100 trials (w = 1). Each area corre¬ 
sponds to each strategy value, sorted in order of increasing 
value. 


works are constructed by rewiring each link with a probabil¬ 
ity p. In each generation of evolution, each player plays DIG 
with directly connected neighbors. The fitness for player j 
is defined as exp[wiTj] in which w represents the intensity of 
selection and ttj represents the average payoff. Each player 
adopts a strategy selected fitness-proportionally from own 
and all neighborhoods’ strategies as a next strategy and then 
changes d and i to a random value with a low probability 
0.05, respectively. 

Effects of social structure We conducted evolutionary 
simulations in which d and i have discrete values from 0 
to 1 with a 0.1 resolution. Figure 2 shows the distribution 
of the evolving strategies from 1000th to 1500th generations 
(w = 1). Although it is not clear from this figure, basically 
just one strategy always occupied the population as a norm 
except for some region of p. 

We see a clear tendency that as the random links in¬ 
crease egalitarianism disappears and instead libertarianism 
grows and then every time occupies the population. We also 
found that egalitarianism was coupled with various intensi¬ 
ties, 5(0.5, *) and libertarianism was coupled mainly with 
the moderate intensity, 5(1,0.5). 

Furthermore, we found that in a few trials network struc¬ 
tures with some intermediate randomness (p ~ 0.06) cre¬ 
ate an interesting scenario in which several norms (including 
timid or moderate libertarianism and bold or moderate egal¬ 
itarianism) emerge in a cyclic manner as shown in Fig. 3. It 
is similar to cyclic dynamics emerging from voluntary inter¬ 
actions in the public goods game or the prisoner’s dilemma 



average intensity 


Figure 3: A typical evolutionary trajectory of strategies aver¬ 
aged over each population. Starting from (0.5,0.5) as mean 
values of a random population, it drew clockwise circles 
composed of (1) a growth of the number of libertarians, (2) 
an increment of the intensities, (3) an increase of the number 
of bold egalitarians and (4) a decrement of the intensities. 


(Hauert et al. (2002), Suzuki et al. (2008)). 

In addition, when using networks with strong spatial lo¬ 
cality (small p), strong selection (large w) favored the coex¬ 
istence of egalitarianism and timid libertarianism. Timid or 
wimpy libertarians, by obtaining a high payoff against any 
strategy, could survive under strong selection in spite of the 
advantageous network structure for egalitarianism. 

Conclusion We demonstrated that the population structure 
could strongly affect the evolution of distributive norms by 
performing the evolutionary simulations using an extended 
version of the Nash demand game. Specifically, we showed 
that spatial locality favors equality seekers while random¬ 
ness in the networks favors “moderate” or “timid” monop¬ 
olists. This result might offer significant implications to us 
living in a world where an increasing number of people are 
connected to each other through social networking although 
our tendency to connect with similar others should be taken 
into consideration. 
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Abstract 

Computational simulation of language evolution provides 
valuable insights into the origin of language. Simulating the 
evolution of language among agents in an artificial world also 
presents an interesting challenge in evolutionary computation 
and machine learning. In this paper, a “jungle world” is 
constructed where agents must accomplish different tasks such 
as hunting and mating by evolving their own language to 
coordinate their actions. In addition, all agents must acquire the 
language during their lifetime through interaction with other 
agents. This paper proposes the algorithm of Evolutionary 
Reinforcement Learning with Potentiation and Memory (ERL- 
POM) as a computational approach for achieving this goal. 
Experimental results show that ERL-POM is effective in 
situated simulation of language evolution, demonstrating that 
languages can be evolved in the artificial environment when 
communication is necessary for some or all of the tasks the 
agents perform. 

Introduction 

Highly efficient and low-cost computer systems have made 
computational simulation possible at an unprecedented scale 
in recent decades. In the specific field of language evolution, 
computational simulation provides a complementary 
methodology that can help researchers develop detailed 
hypotheses on language origins and evolution and test these 
hypotheses in the virtual laboratory of simulation (Cangelosi 
and Parisi, 2002). Furthermore, from a technical perspective, 
an understanding of the fundamental principles in language 
evolution may lead to innovative machine learning algorithms 
and communication methods that are applicable to interactive 
software agents and multi-agent systems (Wagner, Reggia, 
Uriagereka, and Wilkinson, 2003). 

Language is a powerful tool that helps humans coordinate 
actions to accomplish various tasks. It is also a skill that is 
acquired through lifetime learning. The purpose of this paper 
is therefore to establish a simulation framework, i.e. an 
artificial world and a computational method that captures 
these important features into a simulation of language 
evolution. Such a framework should then make it possible to 
gain new insights into evolution of natural and artificial 
language. 

The first part of the simulation framework: “the jungle 
world”, is an artificial environment in which agents attempt to 
hunt and mate through coordinated actions. Initially, the agent 
population have neither any knowledge on the rules of the 
world nor any existing code of communication. Through 


generations of evolution, the agent population must develop 
their own language and learn to use that language to 
coordinate their hunting and mating efforts. Additionally, for 
each agent, the language and the behavioral policy in the 
artificial world must be acquired through interaction with 
other agents and the environment during lifetime. Thus, the 
goal of evolution is to (1) evolve a language, (2) evolve it in 
service of coordinated behavior, and (3) evolve the ability for 
individuals to acquire it during their lifetime. 

To allow efficient simulation of language evolution in the 
jungle world, a biologically-inspired algorithm, Evolutionary 
Reinforcement Learning with Potentiation and Memory 
(ERL-POM) is proposed. This approach utilizes a genetic 
algorithm to configure reinforcement learners units. State- 
action memory and potentiation are introduced to balance 
exploitation with exploration and improve interactive learning 
ability. 

Using the proposed algorithm, language evolution and 
acquisition is simulated under a variety of settings of the 
jungle world. These settings include the scenario where 
communication is necessary for both tasks, one of the tasks, 
or neither of the tasks. The paper also presents and analyzes 
samples of the artificial languages evolved in different 
settings. Experimental results and analysis show that ERL- 
POM is effective in simulating language evolution and 
acquisition, demonstrating that languages can be evolved and 
acquired in the artificial world if communication is necessary 
for one or both of the tasks. 

The remaining sections of the paper are organized as 
follows. The next section gives a brief review on prior work 
in computational simulation of language evolution. The third 
section introduces rules and settings of the jungle world, and 
the fourth section provides details on the algorithm. The fifth 
section presents and analyzes experimental results, and the 
sixth section points out potential directions for future work. 

Prior Work 

In a typical simulation of language evolution, a multi-agent 
system is created to simulate an entire population of agents. 
Each agent acquires a shared communication system either by 
using machine learning methods and/or through simulated 
evolutionary process. 

Simulations of language evolution can be divided into 
situated and non-situated simulations. In a non-situated 
simulation, an agent’s actions consist solely of sending and 
receiving signals. Such non-embodied agents perceive objects 
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and events, but do not change the state of the environment. 
Usually, the agents aim at encoding an arbitrary meaning as a 
signal and send it to another agent, who decodes the signal 
back to a meaning. In such simulations, neural networks 
(Batali, 1998; Kvasnicka and Pospichal, 1999; Smith 2002), 
lookup tables (Kaplan 2000; Smith 2001), associative 
memories (De Boer and Vogt, 1999; Steels and Oudeyer, 
2000) and finite state machines (MacLennan and Burghardt, 
1993; Brighton 2002) are the most commonly used models to 
represent the behaviors. While they have been employed to 
demonstrate many interesting properties of communication 
systems, non-situated simulations are unrealistic in that they 
do not associate external tasks with communication actions. In 
contrast, the evolution of language in nature is strongly linked 
to the need to perform various tasks in which communication 
helps. 

To address this problem, situated simulations can be built. 
In such a simulation, agents are embodied in an artificial 
world. Their goal is usually to accomplish tasks that require 
cooperation or competition among multiple agents. Thus, 
language serves as a necessary or beneficial tool to achieve 
higher performance in multi-agent tasks. Situated simulations 
can be used to test specific assumptions on the role of certain 
behaviors or environmental factors in the evolution of 
language (Quinn 2001; Mirolli and Parisi, 2010; De Greeff 
and Nolfi, 2010; Mitri, 2010; Rawal, Boughman, and 
Miikkulainen, 2014). 

However, prior work on situated simulations is limited in 
two ways. First, most of them focus on a single task. More 
specifically, the rewards of actions, be it communicative or 
non-communicative, do not change throughout the lifetime of 
agents in all generations. In contrast, in nature, language is 
used for numerous tasks, and the rewards of actions depend on 
multiple factors. Second, the language is usually encoded 
genetically and passed on to the next generation through 
genotypes. In contrast, language in nature is acquired during 
lifetime learning and passed on to the next generation through 
interaction among individuals in the environment. While some 
of the existing work addresses one of the above problems, to 
our best knowledge, no prior work on situated simulations 
evolves artificial languages that are both applicable to 
different tasks and acquired through lifetime learning. 
Therefore, the purpose of this work is to introduce a 
simulation framework that achieves both goals. Such a 
framework makes the simulation more realistic and should be 
helpful in discovering deeper insights into the origin and 
evolution of language. 

Simulation Environment 

This section introduces the rules and settings of the jungle 
world - the artificial environment used in the experiments. 
The goal is to establish a paradigm of situated simulation 
environment where languages evolved are used in different 
tasks at different stages of an agent’s life, and knowledge must 
be acquired through lifetime learning. While only a few 
variations of the jungle world are used in the experiments, the 
simulation environment can be modified to serve many other 
experimental goals. In addition, the jungle world does not 
impose any requirement for the artificial controller of the 
agents except for an interface that specifies inputs and outputs. 


Hence, it can be viewed as a general test environment for 
evaluating performances of genetic based machine learning 
algorithms. 

Life in the Jungle World 

This subsection presents basic concepts and rules in the virtual 
world. The focus is on actions and rewards during a single 
generation. Concepts and rules are organized into entries with 
short definitions and descriptions. 

Step and Trial. Time is discretized into steps. At each step, 
agents receive inputs from the artificial environment including 
messages from their partner, and take actions accordingly. A 
trial is a 10-step experiment with two agents. It terminates 
early if any of the participants receive a positive or negative 
reward. 

Jungle. The jungle is the place where agents hunt for prey and 
feed themselves. However, if an imprudent agent enters the 
jungle without its partner at any step, it will be hurt and receive 
a negative reward. 

Agent. An agent has two integer states: fitness and position. 
Fitness ranges from 10 to 200, with 10 the initial value for 
new-born agents. Fitness increases by 10 after a successful 
hunt (defined later) and decreases by 1 after each trial. 
Position ranges from 0 to 5, indicating the distance between 
an agent and the jungle. 

An agent senses its proximity to the jungle and becomes 
alert if its position is 1. It becomes ready for mating if its 
fitness is greater than 100. 

At each step, an agent may decide to take the following 
actions: (1) move towards the jungle, (2) attempt to mate, and 
(3) send a two-bit message. 

Thus, in the typical setting of the jungle world, an agent’s 
brain, i.e. the controller, receives a four-bit input at each time 
step: position alert, mating readiness, message bit 1 and 2. 
Based on the input, the controller makes a four-bit decision, 
indicating whether the agent decides to move towards the 
jungle, to mate with its partner, or to set message bits to one. 

Hunting. An agent succeeds in hunting if it enters the jungle 
with its trial partner at the same time step. A successful hunt 
gives a positive reward and increases fitness by 10 for both 
agents. 

Mating. If a pair of agents decides to mate at the same step, 
and both of them are ready, they succeed in mating and receive 
a positive reward equal to 1/10 of their partner’s fitness. Thus, 
successful mating always claims more rewards than hunting, 
especially so when fitness is high. 

If an agent decides to mate when its fitness does not exceed 
100, it receives a negative reward for cheating its partner. If 
an agent decides to mate while its partner is not ready, it is 
embarrassed and receives a negative reward, too. 

Idling. If a pair of agents claim no reward at the end of a trial, 
they receive a negative reward for wasting time. 

Population. The population in all generations contains 50 
agents with 25 seniors and 25 juniors (except for the first 
generation in which no seniors exist). A senior is an agent who 
survived the selection process after the previous generation. A 
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junior is a newborn agent whose parents are a pair of senior 
agents. 

Generation. A generation contains two phases: parenting and 
socializing (except for the first generation). In the parenting 
phase, junior agents pair with each of their parents for 100 
trials. Since every junior has two parents, a total of 5000 trials 
are conducted in the parenting phase. In the socializing phase, 
every pair of agents in the current generation participates in 
100 trials with random ordering of partners. Thus, a total of 
122,500 trials are conducted in the socializing phase. Rewards 
accumulated from the socializing trials are used as the 
performance measure for all agents. 

Selection. After each generation, agents are ranked based on 
their performance. Top 25 eligible agents survive the selection 
and become the senior agents in the next generation. An agent 
is eligible as long as its life spans fewer than four generations. 

Reproduction. Before the next generation starts, selected 
agents must participate in the reproduction process to produce 
junior agents for the next generation. While the artificial world 
does not impose any requirement on the mechanisms in which 
genotypes of selected agents are used to generate new agents, 
the following approach is employed in the experiments. 

Each of the 25 selected agents is paired with a randomly 
chosen partner. Junior agents are then constructed from them 
through mutation and crossover. Thus, every selected agent 
has at least one child, but may have two or more children due 
to random pairing. Details on genotypes, mutation, and 
crossover is presented in the “Method” section. 

Language Acquisition Requirement (LAR). Junior agents 
in all generations as well as agents in the first generation must 
NOT have any knowledge on the rules of the world and the 
language that is used in it. All agents must acquire such 
knowledge by participating in trials. The genotype of an agent 
defines how it learns, rather than representing such knowledge 
directly. 

Implications to Language Evolution 

As defined in the previous subsection, the jungle world is an 
artificial environment for situated simulations of language 
evolution. This subsection gives a brief discussion on two 
important features of this artificial environment and points out 
their implications. 

Multitasking. Jungle world requires artificial agents to evolve 
languages that are applicable to different tasks from three 
perspectives. 

First and most obviously, there are two tasks (actions), 
namely hunting, i.e. moving into the jungle, and mating, i.e. 
attempting to mate with a partner, that need to be coordinated 
through communication. Each action receives negative or 
positive rewards under different environmental state. 

Second, if the long-term reward of these actions remains 
constant in a changing environment, the so-called “different 
tasks” are actually a single task interpreted subjectively with 
the semantics of multiple tasks. After all, the core challenge 
of multitasking is that possible actions pursuing different tasks 
must be properly prioritized and coordinated in order to 
achieve higher rewards in the long run. 


Indeed, agents in the jungle world are required to coordinate 
and prioritize their tasks throughout their lifetime via the 
language they evolve. For instance, agents in the socializing 
phase usually have a fitness score higher than 100, which 
gives them legitimate choices of hunting and mating. 
Successful mating always yields a higher reward than hunting. 
However, fitness decreases at the end of each trial. Therefore 
the agents have to hunt regularly to maintain a high fitness 
level. Since both actions are always available to the agents, 
they must learn to prioritize the actions and coordinate with 
partners in pursuing each task. Note that the messages sent and 
received are part of the environmental state, and in some 
variance of the jungle world, messaging is the only way in 
which agents can communicate state or convey intention. In 
this sense, the jungle world presents a true multi-task 
challenge. 

Third, due to LAR and the fitness requirement for mating, 
agents in the jungle world must adapt their strategy at different 
stages of their life. As in nature, junior agents must first learn 
to interpret the environment and hunt successfully through 
communicating with their parents in the parenting phase. 
Before their fitness score can be maintained at a high level (i.e. 
greater than 100), mating is not a viable choice. However, as 
agents proceed into the socializing phase, they must adjust 
their strategy and learn to balance mating and hunting in order 
to maximize their cumulative rewards. 

Language Acquisition. LAR ensures that agents have to learn 
to survive the jungle world during their lifetime rather than 
relying solely on the information encoded in their genes. LAR 
thus makes simulation of language evolution more realistic: in 
nature, language and knowledge is largely acquired through 
lifetime learning rather than genetically encoded. While some 
theories suggest that the human genotype encodes the 
universal grammar (Chomsky and DiNozzi 1972), it is 
commonly accepted that any particular language should be 
learned. 

From a technical perspective, LAR presents interesting 
challenges. Many powerful methods such as neural networks 
are usually not directly applicable to learning in real time. 
Meanwhile, traditional reinforcement learning algorithms 
such as Q learning require accurate and full observation of 
environmental state. In contrast, in typical settings of the 
jungle world, states are only partially observable, i.e. agents 
cannot observe the fitness or the position of their trial partners. 
While communication can be leveraged to compensate partial 
observation, at the early stage of evolution, semantics of the 
messages are rather unreliable and frequently changing. Even 
after a language is established among the senior population, 
messages from junior agents in the parenting phase can 
mislead or confuse their parents and in the worst case, reverse 
the progress of language evolution in previous generations. 

While reflecting important features of language in nature, 
both multitasking and LAR present interesting challenges to 
the method used in such simulations. 


Method 

This section presents details on Evolutionary Reinforcement 
Learning with Potentiation and Memory (ERL-POM) - the 
method adopted to allow efficient and effective simulation in 
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the jungle world. The reinforcement learner serves as the brain 
of the agents. In each generation, every individual in the 
population has its own reinforcement learner. Through 
lifetime interaction with trial partners, these controllers adjust 
their policy gradually to achieve higher cumulative rewards. 

Controller Structure 

This subsection introduces the structure of the controller. 

Inputs and Outputs. The controller assumes that both the 
inputs (i.e. environmental states) and the outputs, (i.e. action 
decisions) are binary, or can be converted to binary. 

Learner Unit. Similar to neurons in a neural network, learner 
units are basic functional units of a controller. They take 
binary inputs (multiple bits) and give a single bit output. Each 
unit consists a policy map and a memory. The policy map is 
implemented as a Hashmap whose keys are the input patterns 
and values activation parameters (AP). 

Memory. Memory is a queue of input-output pairs with a 
certain size. Inspired by the short term memory in nature, old 
items in the queue are replaced by new items once the number 
of items reaches memory size. Memory provides a record on 
decisions given input patterns and helps the learner unit adjust 
APs to maximize long term reward. 

Expansion. Since both inputs and outputs are binary, learner 
units can be connected similarly to neurons in a neural 
network to form various complex structures. They also adjust 
their weights based on memory and rewards while interacting 
with the environment. Thus, a controller is essentially a neural 
network designed for real-time learning. 

However, it is worth pointing out that learner units are more 
powerful and complicated than neurons in typical neural 
networks. Therefore, it usually takes fewer learner units than 
neurons to solve a problem. As a matter of fact, given the 
typical setting of the jungle world, merely four learner units 
are needed to achieve high average cumulative rewards, which 
is similar to a single layer neural network if all learner units 
are replaced by neurons. 


Action(s) 



Environmental State 


Figure 1: Controller Structure. A controller may contain one 
or multiple layers of learner units, connected similarly to 
neurons in a neural network. Each learner unit consists a 
policy and a memory. The policy is a Hashmap whose keys 
are input patterns and values activation parameters, and the 
memory is a queue of input-output pairs. 


Real Time Learning and Evolution 

Initially, all policy maps are empty. As an agent explores the 
artificial world, learner units receive inputs and insert a new 
entry to their policy map for each previously unseen input. The 
AP in each new entry is set to 0. The probability of activation 
Pa given the activation parameter AP of an input is computed 
according to formula 1. 

P a = g( AP) = ^ (1) 

Here, o is the sigmoid function. Note that for each new 
entry, the probability of activation is 0.5, i.e. the learner units 
performs random exploration. 

Learning with Memory. Whenever a learner unit makes a 
decision given an input, the event (input-output pair) is pushed 
into the memory queue. After the queue is hill, oldest records 
are replaced by new ones. 

When the agent collects a reward r, each learner unit looks 
into its memory. For each event (i.e. input-output pair) in the 
queue, the learner unit updates the AP of the corresponding 
input in the policy map according to formula 2. 

AP' = a ■ AP + r ■ d n (2x — 1) (2) 

Here, AP and AP’ are the activation parameter before and 
after the update, respectively, a is the decay rate (0 < a < 1), r 
is the normalized reward (-1 < r < 1), d is discount factor (0 < 
d < 1), n is event index in the memory queue, with 0 
representing the most recent, and x is the binary output. 

Intuitively, learner units increase the AP mapped to an input 
pattern (thus the activation probability given that input) if (1) 
a positive reward is received, and the unit outputs 1, or (2) a 
negative reward is received, and the unit outputs 0. The decay 
rate balances the influence of knowledge from the past with 
the most recent experience. When a = 0, only the latest 
experience is taken into consideration. The discount factor 
reflects the contribution of decisions in the past. If d = 0, only 
the last decision is assumed to be the cause of the reward, and 
if d = 1, the reward is assumed to be equally attributable to all 
events in the memory. 

Potentiation. Because junior partner behave randomly, it is 
possible that senior partners are confused during the parenting 
phase. Therefore, potentiation is introduced to retain long¬ 
term memory, i.e. well-tested knowledge and rules learned in 
the past. 

As in nature, if the brain is confident enough on a decision 
for a certain input, that decision is fixed. Specifically, if AP of 
an input satisfies the condition in formula 3, the learner unit 
fixes the decision on that input to 1 if AP is positive, and to 0 
if it is negative. 

I CJ (AP)-0.5| ^ prp 

0.5 — ^ ' 

Here PT (0 < PT < 1) is potentiation threshold. Intuitively, 
PT specifies how confident the controller must be in order to 
fix its decision. If PT = 0, the controller will fix its decision 
after learning from a single event, and if PT = 1, the controller 
will never fix its decision. 

Evolution. As introduced in the previous subsections, each 
learner unit is defined by the following parameters: (1) the 
potentiation threshold, (2) the discount factor, (3) the decay 
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rate, and (4) the memory size. Thus, a controller with m 
learner units and a fixed topology has a genome with 4 m 
numbers, which include m positive integers with a maximum 
(i.e. the maximum size of the memory), and 3 m real numbers 
[0..1]. Since all controllers have numeric genomes with 
uniformly defined structure, mutation and crossover can be 
directly applied to producing controllers for a new generation. 

In the above mechanism, the role of evolution is to explore 
the learner space and optimize parameters for units in the 
controller. In other words, evolution aims to improve learning 
ability rather than encode policies. Junior agents in a 
generation are equipped with potentially better learning tools. 
Nevertheless, they have no specific knowledge of the rules of 
the world or the language among the senior agents. Thus, this 
method satisfies LAR as defined in the previous section. 

Experimental Results 

This section presents and discusses the experimental results of 
the situated simulation under four different settings of the 
jungle world. In the first setting, communication channels are 
disabled, and the environment is fully observable, i.e. agents 
can observe the fitness and position of their partners directly. 
The second settings allows full observation while enabling 
communication. In the third setting, communication channels 
are enabled, but agents cannot observe the position or fitness 
of their partner. The fourth setting is the same as the third, 
except that partner position is observable. The above four 
settings aim to address the following questions: 

1. When communication channels are disabled and 
environmental states fully observable, what kind of 
behaviors emerge as a baseline? 

2. If environmental states are fully observable, i.e. 
communication is enabled but unnecessary, will any 
language emerge? 

3. When communication is necessary for all tasks, can 
the agents evolve a language to coordinate their 
actions? 

4. If communication is necessary for some of the tasks 
but not for the others, will any language emerge? 

In each of the experiments, any language that emerges is 
analyzed to understand what it is and how it helps agents to 
perform their tasks. 

Experimental Setup 

Table 1 shows the parameter settings used in all simulations. 
The normal distributions (ND) have a standard deviation of 
0.1, with a mean of zero. Mutation rate is applied to each gene 
independently. A special rule (SR) is applied to the mutation 
of memory size: it increases or decreases by one with 0.5 
probability for each. 

After the seniors are selected from the previous generation, 
each senior has one chance to be paired with another senior 
randomly to produce a junior for the next generation. The 
genome of the first senior is used as the initial genome of the 
child. With a probability equal to crossover rate, each gene 
(i.e. number) has a chance to be replaced by the corresponding 
gene of the random spouse. In addition, it can mutate based on 


Item 

Value/Setting 

Population Size 

50 

Senior/Junior 

25/25 

Mutation Rate 

0.1 

Mutation Rule 

Gene 

Initial Value (Method) 


MS 

1 (SR, 1<MS<10) 


PT 

1.0 (ND, 0 < PT < 1) 


a 

1.0 (ND, 0< a <1) 


d 

0.0 (ND, 0< d<\) 

Crossover Rate 

0.1 


Table 1: Parameters and Mutation Rules 


the mutation rate and rules in table 1. Crossover always takes 
place before mutation. 

Note that in table 1, Initial Value refers to the values of 
parameters of learner units in the first generation. Memory 
size is set to one so that agents memorize only the last event. 
Potentiation threshold is 1.0, thus the agents never fix their 
decisions. Discount factor is 0.0, i.e. reward is assumed to be 
attributable to only the latest action. In other words, memory 
and potentiation do NOT exist for the first generation. It is up 
to evolution to decide if they are desirable. Such a setting 
serves two purposes: (1) it demonstrates the benefit of 
memory and potentiation through evolution, and (2) it avoids 
unnecessary structural and algorithmic complication. 
However, it has a downside of potentially delaying the 
emergence of language in the artificial world since such a 
simple starting point may be far from the best settings. 

Experiment 1 (Baseline) 

In this experiment, the communication channels are blocked 
and the environment is fully observable to the agents. Thus, 
agents cannot send messages to their partner, but they can 
observe the fitness and the position of their partner directly. 
The results in this group serve as a reference to evaluate the 
performance in other experiments. Figure 2 shows the results. 



Figure 2: Experiment 1. In approximately eight generations, 
accumulated rewards become stabilized, and the performance 
does not differentiate much between juniors, seniors, and the 
champion. All results, including those of the other 
experiments are averaged from 20 runs. 

Table 2 presents a sample policy of a champion in the 100 th 
generation. According to the policy, the champion mates with 
its partner whenever they are both ready to mate (i.e. fitness > 
100). Until then, it moves towards the jungle as long as its 
position is greater than one. When getting close to the jungle, 
the champion waits for the partner if it is not in position, and 
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jumps into the jungle as soon as both of their positions equal 
to one. 
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Table 2: Champion Policy - Experiment 1. Regular letters are 
input states, and bold letters are actions. F represents fitness, 
i.e. whether the fitness score is greater than 100; P is position, 
i.e. whether position equals 1; F’ and P’ indicate partner’s 
fitness and position, respectively; A is the action to approach 
the jungle; and M is the action to mate. 

Lastly, memory size after ten generations averages 1.28, 
with a majority of learner units having no memory beyond the 
last decision. This result can be explained by the fact that in a 
fully observable environment, with a majority of senior agent 
policies like that in table 2, the challenge faced by a junior 
agent is largely a Markovian problem. On the other hand, the 
average potentiation threshold is 0.965 - the agents do fix their 
actions, but only when they are very confident in their 
decisions. 

Experiment 2 (Unnecessary Communication) 

Experiment 2 is different from the Experiment 1 in that agents 
are allowed to send and receive messages. Since the position 
and fitness of their partner are still observable, communication 
is possible but unnecessary in achieving any of the tasks. 

Figure 3 presents the results of Experiment 2. Memory size 
after ten generations averages 1.42, with average potentiation 
thresholds as 0.971. 





Figure 3: Experiment 2. Average cumulative rewards of each 
generation are nearly identical to those in Experiment 1. 


The champion policies in Experiment 2 are characterized by 
the following three observations: 

1. Given same observation of fitness and positions, 
actions are the same regardless of the message 
received. 

2. Messaging policies vary from generation to 
generation, while having no influence on the stability 
of performance. 

3. There is no clear correlation between messages and 
environmental states in most champion policies. 

The results in Experiment 2 suggest that if communication 
is unnecessary for any of the tasks, messaging policy plays no 
role in agent performance. Agents evolve no language even 
when the communication channels are available. 

Experiment 3 (Necessary Communication) 

In this experiment, agents cannot observe the position and 
fitness of their trial partners. Therefore, the only means by 
which agents can coordinate their efforts in mating and 
hunting is through communication. 





Figure 4: Experiment 3. Champions in the agent population 
achieve similar performance to that of Experiment 1 after 
approximately 25 generations. Champion performance (red) 
stabilizes afterwards. However, compared to Experiment 1, 
average cumulated rewards among entire generations (blue) 
and among the juniors (green) are lower and have bigger gaps 
in between. 

The reasons for lower average performance is that multiple 
languages may occur simultaneously in one generation, 
causing confusion among the juniors in parenting phase, thus 
lower their performance in the socializing phase. 

Table 3 presents a sample policy of a champion in the last 
generation. Champion policies after thirty generations encode 
the fitness and position accurately and consistently in 19 out 
of the 20 runs. However, in almost all generations, more than 
20% of the seniors have a messaging policy that either fails to 
encode fitness or position states accurately, or differs from the 
champion policy. 

Also, average memory size after 30 generations is 7.10, and 
the average discount factor is 0.722. The reason for the long 
memory is that past decisions can be used to complement 
partial observation and improve learning efficiency. For 
instance, if an agent keeps sending wrong messages while 
being in position for hunt, its trial partner can never know that 
the agent is ready, resulting in a punishment for idling to both 
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agents in the trial. While potentiation may keep the seniors 
from “second guessing” their correct policy, a long memory 
of the past can help the juniors learn faster in such cases. 
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Table 3: Champion Policy - Experiment 3. SI and S2 
represent the first and second message bit; R1 and R2 are the 
message bits received from the partner. All other letters have 
the same meaning as those in Table 2. The messages encode 
fitness and position values. 

Although the average potentiation threshold is close to one 
(0.981), potentiation is crucial in the simulation because the 
parenting phase is sufficiently long to generate enough 
confusing interactions to reverse knowledge encoded in the 
seniors’ controllers. In fact, if potentiation threshold is fixed 
to one (i.e. potentiation does not exist), language simply 
cannot be established among the agent population, rendering 
cumulative rewards consistently negative in all generations. 

Experiment 4 (Partially Necessary Communication) 

In this experiment, the position of a trial partner is observable 
while the fitness of the partner is hidden. Communication is 
possible between partners in a trial, however, it is necessary 
only for mating. 




Figure 5: Experiment 4. Average performance in the first 10 
generations is similar to that in Experiment 1. After 15 to 20 
generations, rewards stabilize at a level approximately 50% 
higher than that in Experiment 1. 


Typical messaging policies in the first ten generations of 
each run encode fitness in two bits (e.g. “11” for “ready to 
mate” and “01” otherwise). As evolution proceeds, more 
advanced policies may emerge, leading to better performance 
than the full observation baseline. Table 4 shows a messaging 
policy from a champion in the 100 th generation. 
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Table 4: Champion Messaging Policy - Experiment 4. Dash 
indicates that an input has no effect on the outputs. 


While the messaging policy in table 4 looks confusing by 
itself, interestingly, it exploits the setting of the jungle world 
effectively if combined with corresponding action policy. 
Although such a policy can be expressed by a table as before, 
it is translated into the following rule-based policy due to 
limited space. 

1. Run towards the jungle until position is one. 

2. Wait until partner position becomes one. 

3. Enter the jungle if positions of both agents are one. 

4. If “11” is received before entering the jungle, mate 
and enter the jungle at the same time step. 

Note that Rule 1 to 3 are based purely on observation, i.e. 
they have nothing to do with the messages received because 
partner position is directly observable. In fact, the only rule 
that relies on communication is Rule 4. It can be triggered only 
under the circumstance where both agents are one step away 
from the jungle and “11” is received; and according to Table 
4, “11” is sent whenever an agent is ready for both hunting 
and mating. Since messages are ignored in all other scenarios, 
the only messaging rule that a junior agent needs to learn is to 
send “11” when F and P are both one. This rule has two 
positive effects. First, the juniors learn the language faster 
because it tolerates faults on all but a few inputs (i.e. inputs 
with F and P equal to one). Second, combined with the action 
policy, it allows agents to mate and hunt successfully in a 
single trial, making the average fitness of the population close 
to the maximum (200) all the time. 

Among the 20 runs in Experiment 4, approximately half of 
them (11/20) discovered policies with similar principles, thus 
achieving higher performance than Experiment 1 and 2. Note 
that without the emergence of meaningful language, agents in 
Experiment 1 or 2 cannot discover behaviors that accomplish 
hunt and mating in a single trial. The fact that languages can 
be evolved to get around deceptive local optima (e.g. policies 
that decide to mate whenever both agents are ready) is 
intriguing. 

Additionally, results in Experiment 2 and 4 suggest that for 
language to emerge, it is essential that language is indeed 
necessary to perform some of the tasks. However, as long as 
language appears, it can be evolved into a beneficial tool for 
all tasks. 
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Future Work 

Situated simulation of language evolution provides interesting 
insights on the origin and evolution of language. The jungle 
world simulation can be used as a starting point for more 
advanced simulations in two ways. 

First, in nature, spoken language is formed with sequential 
patterns of utterances. Messages may span multiple time steps 
rather than contained in a single step. Also, they may start at 
any step. Such sequential features are essential for simulating 
the evolution of more complex and structured languages. 

Second, languages in the real world are usually structured 
based on syntax. The emergence of grammatical components 
and structures such as nouns and verbs, subjects and objects, 
phrases and sentences is an important aspect of language 
evolution. A possible approach in the jungle world is to 
establish social roles in the simulation, and create tasks around 
them. Grammatical structure might then emerge in order to 
communicate such role-based information (Bickerton 1990). 

Integrating sequential and/or structural features into the 
jungle world framework will make simulations more realistic 
and informative. 

Conclusion 

This paper presents a framework for situated simulation of 
language evolution. It introduces an artificial environment, the 
jungle world, which can be used to simulate the evolution and 
acquisition of multitask languages. The paper also proposes a 
method: Evolutionary Reinforcement Learning with 
Potentiation and Memory (ERL-POM) for simulation of 
language evolution in this environment. 

Experimental results indicate that languages can be evolved 
in the artificial environment if communication is necessary for 
some or all of the tasks. Languages can be used to coordinate 
efforts in multiple tasks where communication is required. 
When communication is not necessary for all tasks, languages 
can be leveraged to overcome local optima and discover better 
policies. Experimental results also show that memory and 
potentiation are necessary for such emergence. Extending the 
simulation to sequential and structured communication is a 
most interesting direction of future work. 
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Abstract 

Understanding how the dynamics of language learning and 
language change are influenced by the population structure of 
language users is crucial to understanding how lexical items 
and grammatical rules become established within the con¬ 
text of the cultural evolution of human language. This pa¬ 
per extends the recent body of work on the development of 
term-based languages through signalling games by exploring 
signalling game dynamics in a social population with over¬ 
lapping generations. Specifically, we present a model with 
a dynamic population of agents, consisting of both mature 
and immature language users, where the latter learn from the 
formers’ interactions with one another before reaching ma¬ 
turity. It is shown that populations in which mature indi¬ 
viduals converse with many partners are more able to solve 
more complex signalling games. While interacting with a 
higher number of individuals initially makes it more diffi¬ 
cult for language users to establish a conventionalised lan¬ 
guage, doing so leads to increased diversity within the input 
for language learners, and that this prevents them from de¬ 
veloping the more idiosyncratic language that emerge when 
agents only interact with a small number of individuals. This, 
in turn, prevents the signalling conventions having to be rene¬ 
gotiated with each new generation of language users, result¬ 
ing in the emerging language being more stable over subse¬ 
quent generations of language users. Furthermore, it is shown 
that allowing the children of language users to interact with 
one another is beneficial to the communicative success of the 
population when the number of partners that mature agents 
interact with is low. 

Introduction 

The fact that children around the world are readily able to 
learn the language of their given social group, even though 
these languages are in a constant state of flux (Hopper, 1987) 
and exhibit high levels of variation, flexibility of usage, and 
are ever changing over time within dynamic populations 
(Christiansen and Kirby, 2003) indicates that cultural factors 
play a crucial role in the shaping of human language. The 
establishment of the meanings of lexical items and the sub¬ 
sequent change in these meanings over time is, in part, what 
led Lewis (1969) to work on the conventionality of mean¬ 
ing; how specific arbitrary signals establish themselves as 
referring to a specific meaning. He introduced a signalling 


game in order to explore how meaningful language might 
evolve from the use of initially random signals. Over the 
last decade, renewed interest in these ideas has led to a body 
of work that has explored the evolution of term-based lan¬ 
guages through coordination games (Skyrms, 2004, 2009, 
2010; Huttegger, 2007; Barrett, 2006, 2009; Argiento et al., 
2009). 

In this paper, we further develop this work by imple¬ 
menting a reinforcement learning (R-L) model involving a 
single Sender-Receiver pair, which is then extended into a 
population-based, multi-generational, simulation. This is 
both novel and necessary, given that human language per¬ 
sists in a complex social milieu, which is not captured by 
standard R-L models, and employing the R-L procedure in 
a population-based model could therefore offer insights into 
how lexical items become established within the context of 
the cultural evolution of human language in structured pop¬ 
ulations with overlapping generations. 

This paper presents a model that demonstrates that, if 
agents only interact with a small number of other agents, 
then it is easier for these agents to establish a convention¬ 
alised system of language usage than in cases where they 
interact with a larger number of the population. However, 
by interacting with a smaller subset of the population, these 
individuals develop a more idiosyncratic language. Thus, it 
becomes difficult for the children of these agents, who learn 
from the interactions of their parents, to communicate with 
children of other mature agents during future epochs. In 
contrast, allowing individuals to interact with a larger pro¬ 
portion of the population does initially make it more diffi¬ 
cult to establish agreed upon conventions of usage, but it 
does result in an increased amount of diversity within the 
language learner’s training input data. This better enables 
the children of these mature agents to successfully interact 
with the offspring of other mature agents, previously unen¬ 
countered by the agent in question; this aids the negotia¬ 
tion of conventional signalling in the population as a whole. 
This, in turn, leads to the development of a language that 
is more stable and consistent over generations of language 
users, compared to the case where individuals have to rene- 
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gotiate conventions of use in every generation. 

The n = 2 game 

In a Lewis signalling game there are two players, a Sender 
and a Receiver. A single bout of the game commences with 
the Sender knowing that the world is in some random state, 
t, but the Receiver being ignorant of this information. The 
Sender then selects a signal, s , with which to convey the 
world state to the Receiver; the Receiver observes s and has 
to pick an appropriate action, a. If the action chosen by the 
Receiver matches the world state (i.e., a = £), the bout is 
considered to have been a success. Here, t , s , and a are 
drawn from finite sets T, S , and A, respectively, which are 
all of size n; in Lewis’ (1969) original model n = 2. 

Over successive bouts of the game, both players are 
expected to adapt their behaviour in order to increase 
the chance of achieving communicative success, typically 
through some kind of reinforcement learning. The easiest 
way to conceptualise this is in terms of urns and balls. At 
the outset of the simulation run, an unbiased Sender will 
have n urns, one for each state of the world, each of which 
contains n balls, one associated with each of the n possible 
signals. Let’s suppose that during the first bout of the game, 
t = “red”. The Sender picks a random ball from their red 
urn. The symbol on this ball dictates the signal to be made, 
s\ in this case, suppose s =“fah”. Likewise, the Receiver 
observes s = “/ad”, and picks a random ball from their fah 
urn, which indicates the action to be taken, a. Both balls are 
then returned to their respective urns. If a = t , the inter¬ 
action was a success, and in accordance with the principles 
of Roth-Erev reinforcement learning (Roth and Erev, 1995), 
the Sender adds extra balls of type s to urn t and the Re¬ 
ceiver adds extra balls of type a to urn s. The number of 
extra balls, u , added to the urns corresponds to the utility as¬ 
sociated with the outcome of the signalling bout; in Lewis’ 
(1969) original game u = 1 if a bout is successful and u = 0 
otherwise. More formally, at any point in time, b(t , s ) is the 
number of balls for signal s in the Sender’s urn for state t , 
and accordingly, 6(s, a) is the number of balls correspond¬ 
ing to act a in the Receiver’s urn s. Thus, the behavioural 
strategies for Sender (a) and Receiver (p) are as follows: 
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( 1 ) 

There are a number of possible signalling equilibria that 
can arise in such a game. Perfect signalling strategies result 
in optimal pay-offs for the players by mapping each world 
state onto a unique signal and each signal onto the unique 
appropriate action (Figure 1). This behaviour constitutes 
an evolutionarily stable strategy (ESS) because when it is 
played by the whole population there is no incentive for any 
individual to change their strategy. 
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Figure 1: Optimal strategies for the n = 2 game. 


However, players may spend significant time playing sub- 
optimal ‘pooling’ strategies in which Senders employ the 
same signal for multiple world states (pooling these world 
states together), making it impossible for Receivers to deter¬ 
mine the state of the world from the signals that they receive 
(Figure 2). Pooling strategies are not ESSs since adjacent 
strategies often achieve equal fitness (e.g., the two pooling 
strategies in fig 2). The expected pay-off for such a pooling 
strategy is 0.5 when n — 2. 
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Figure 2: Two of the possible sub-optimal pooling strategies 
for the n = 2 game. 

It has been shown by way of both computational simu¬ 
lation (Barrett, 2006, 2009; Skyrms, 2010) and mathemati¬ 
cal modelling (Huttegger, 2007; Argiento et al., 2009) that 
the n = 2 game will nearly always converge upon an op¬ 
timal signalling system. Indeed, Skyrms (2009) went on to 
demonstrate that this behaviour also holds in a case where 
there are two Senders and one Receiver. These results are 
further supported by Table 1, where it can be seen that a 
computational model of the n = 2 game being played for 
10 6 bouts will almost always reach a perfect signalling equi¬ 
librium. 

Higher-n games 

However, a successful outcome is not always achieved when 
the game is played with n > 2, i.e., with a higher num¬ 
ber of states, signals and actions (Skyrms, 2010; Huttegger, 
2007; Barrett, 2006, 2009). Here, we adopt the methodol¬ 
ogy of Barrett (2006, 2009). The R-L model that formed 
the basis of the population-based R-L model was run multi¬ 
ple times, for various values of n, with each run consisting 
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of 10 6 bouts, P, of the game, where a run of the simula¬ 
tion is considered to fail if the number of successful bouts 
is less than 90% of the total number of bouts. Table 1 (left) 
shows the results of these runs, which agree with those of 
Barrett (2006, 2009). Table 1 (right) shows the results of 
a smaller sample of 100 runs, with all other parameters be¬ 
ing held constant, and results that are quite similar. It can 
be seen clearly from Table 1 that, in a n =3 game, the play¬ 
ers fail to achieve a high enough rate of signalling success 
roughly 10% of the time, and that this increases to « 20% 
for n =4 games, « 60% for n =8 games, and so on. The 
comparison in Table 1 is important to show, as the extended 
model presented later is run for 100 runs due to limits on 
computational power. 


Table 1: Table depicting the success rates of the R-L model 
after 10 6 bouts for various values of n, with 1000 runs (left) 
and 100 runs (right). 


Generational Reinforcement Learning Model 

Although interesting in their own right, the dyadic setting 
considered so far limits the conclusions that can be drawn. 
After all, human language persists in a highly complex so¬ 
cial milieu, and it has been shown that the structure and com¬ 
position of a population can influence the dynamics of lan¬ 
guage change over time (Brace et al., 2015). Thus, the orig¬ 
inal reinforcement learning model (R-L) was extended in a 
number of ways. First, whereas the original model focused 
on a single Sender and Receiver playing for B bouts, in the 
population model (R-L-P), there are a population of agents. 
This population is divided into a number of mature agents, 
Nm, and a number of immature agents, Nj; with all agents 
starting life as immature and then being promoted to mature 
status after the first epoch of their existence. Mature agents 
play bouts of the language game with one another, updating 
their language behaviour according to game outcomes. By 
contrast, while agents are immature they merely observe the 
language bouts played by their mature parent, and update 
their language behaviour on the basis of the outcomes of 
these observed games. The lifespan of agents is two epochs; 
the first as an immature agent and the second as a mature 
agent, after which they are removed from the simulation. 

It is important to emphasise here that throughout the sim¬ 
ulation, when new immature individuals are introduced to 
the population, as in the standard R-L model, they have no 
knowledge of the language currently being used. This is 


true for the initial population of mature agents, and also true 
for new immature agents born into all subsequent epochs. 
For each immature agent, each world state, t, is associated 
equally with each signal, s , when playing as Signaller, and 
each signal, s , is associated equally with each action, a, 
when playing as Receiver, i.e., each of an immature agent’s 
n state urns contains a single ball for each possible signal, 
and each of their signal urns contain a single ball for each 
possible action. Thus, any change in communicative per¬ 
formance or language use over generations is the result of 
language evolution; there is no biological evolution on the 
part of the agents. 

Secondly, instead of agents merely interacting B times, 
the R-L model is extended to include a generational aspect. 
In other words, the model is set up to run for a number of 
epochs, E , and during each epoch, every mature agent plays 
B bouts as the Receiver with other mature agents; with the 
amount of bouts it plays as the Speaker being the result 
of how many other agents it is partnered with divided by 
B. The number of different mature agents that a mature 
agent interacts with, P, is a key parameter of the model; 
with each mature agent’s total number of interactions, P, 
being equally divided amongst its P unique partners, i.e., 
the number of interactions that an agent has with each part¬ 
ner is P/P (rounding up). A key feature of the model is 
therefore that varying P does not vary the number of bouts 
played, just the number of players that the bouts are played 
with. 

The R-L-P model thus proceeds as follows. At the start 
of the simulation run, an initial population of Nm = 15 un¬ 
biased mature agents are created, with an equal chance of 
generating each signal for each world state. For each epoch, 
P, a fresh population of Ni = 15 unbiased immature agents 
is created, each having an equal chance of generating each 
signal for each world state. Each immature agent is assigned 
a randomly selected mature agent to act as their parent; with 
each mature agent acting as a parent to only one immature 
agent. Each mature agent is assigned P unique randomly 
selected mature partners with which to play the signalling 
game. Each of the mature agents then engages in P/P bouts 
with each assigned partner, with each participant updating 
their signalling or receiving strategy at the end of each bout 
through reinforcement learning. Each child will update their 
behaviour based on the outcome of the bouts that their par¬ 
ent are involved in; i.e., at the end of a successful bout, a 
Sender’s child will add a ball of type s to urn t, and a Re¬ 
ceiver’s child will add a ball of type a to urn s. At the end 
of an epoch, all mature agents are removed, all immature 
agents are promoted to mature agent status, and a new set of 
unbiased immature agents is created. 

Results 

The R-L-P model does not achieve a successful signalling 
system as often as the standard R-L model. Indeed, compar- 
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Figure 3: Graph depicting the average number of successful bouts 
across epochs for P E {1, 2,4,8,10} for a n — 20 game with 
Nm — 15, Ni m 15, B = 10 6 , and u — 1. Averaged over 30 
runs. 


ing Table 2 to Table 1 shows how, with P = 1, success rates 
are lower for all n-games than in the standard R-L model. 



Partners=l 

Partners=2 

Partners=5 

Partners=10 

n — 4 

0.3 

0.36 

0.77 

0.81 

n — 8 

0.0 

0.3 

0.48 

0.62 

n — 10 

0.0 

0.2 

0.20 

0.46 

n — 20 

0.0 

0.0 

0.29 

0.57 


Table 2: Language evolution success rates after 100 R-L-P 
model runs of E = 20 epochs each for various values of n 
and P, where Nm = 15, Nj = 15, B = 10 6 , and u = 
1. With success being measured using the aforementioned 
metric used in Table 1 . 

However, increasing the value of P does increase the rate 
of success (Table 2 and Figure 3). In Figure 3, we see that, 
with P = 1 or 2, there is an initial level of success, which 
corresponds to the number of successful bouts that would be 
seen in the normal R-L model for a n=20 game. 

Low P values create a situation where mature agents form 
a communicative system based upon conventions agreed 
upon between themselves and only a small number of other 
agents. Thus, in subsequent epochs, when the children of a 
mature agent have to interact with the children of another 
mature agent, who has not previously interacted with the 
mature agent in question, the agreed upon conventions that 
both parties formulated during the first epoch are likely to 
be of little use, due to different agents forming conventions 
based upon their idiosyncratic experiences. This gives rise 
to sub-optimal behaviour at the population-level. However, 
any immature agents that are present learn from the success¬ 
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Average Mu ruber of Signals Experienced for Each World State 

Figure 4: Graph depicting the average percentage of successful 
communicative bouts between all mature agents plotted against the 
number of unique signals presented to them during said bouts in the 
second epoch for a n — 20 game, for P=l, 2, 5, 10 and Nm = 15, 
Ni = 15, B = 10 6 , and u — 1. Averaged over 60 runs. 

ful bouts of their respective parents, hence the steady in¬ 
crease in success rates for these lower P values 1 . 

In contrast, with high P values, we see an obvious and 
immediate increase in communicative success. This is due 
to the way an increase in P leads to the children of the 
mature agents having more diversity in their training input, 
which better enables these individuals to communicate with 
a larger number of other agents upon being promoted to ma¬ 
ture agent status (Figure 4). 

Imagine a hypothetical mature agent from epoch one, who 
is partnered with ten other randomly selected agents, who in 
turn, are partnered with ten other agents. In the simulation, 
bouts are scheduled in such a way that agent i will have one 
of the allocated bouts with one of its randomly selected part¬ 
ners, then a gent 2 will do the same; and so on, until we reach 
agent m n • At which point we go back to agenti and allow it 
to have its second bout, again with a randomly selected part¬ 
ner; and so on until each partner of every agent has played 
B/P bouts with the agent. In the P= 1 case, unsurprisingly, 
we see higher levels of initial success during the first gener¬ 
ation than in the P= 10, due to the establishment of a con¬ 
vention involving fewer agents having to negotiate with one 
another (Figure 5, left). 

In contrast, with P = 10, it is slightly harder to establish 
a conventionalised usage because each agent has to nego¬ 
tiate with an increased number of different agents, which 
results higher levels of signal diversity (Figures 4 and 5, 
left). However, when the offspring of the first epoch’s ma¬ 
ture agents are forced to interact with a different subset of 
the population in the second epoch, populations with higher 

1 Given enough epochs, it is likely that the agents would give 
rise to a successful communicative system. 
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Figure 5: Graph depicting the number of successful bouts out of every 100 bouts, over all 10 6 bouts of a random agent during the first epoch 
(left) and the second epoch (right); with P= 1 (blue line) and P= 10 (red line). Where n = 20, Nm = 15, Ni = 15, B = 10 6 , and u — 1. 


P values exhibit higher communicative success because the 
increased signal diversity in the previous epoch, combined 
with the immature agents learning from the successful bouts 
of their parents, has resulted in these agents having estab¬ 
lished a conventionalised usage that requires less renegotiat¬ 
ing when speaking to previously unencountered agents than 
in the P = 1 case, where agents have a more idiosyncratic 
language that requires them to renegotiate the conventions 
established by their parents (Figure 5, right). 

This is why Figure 4 shows an increase in communica¬ 
tive success with higher values of P, while also indicating a 
negative trend in each of the data clusters for each specific P 
value; although it is harder to establish a language when ne¬ 
gotiating meaning-signal pairs with more individuals, doing 
so makes it more stable across generations (Figure 5). In¬ 
deed, as Figure 5 (right) shows, the agreed upon convention 
of usage in cases of lower P values has to be renegotiated 
in subsequent epochs due to it offering little communicative 
success to agents when communicating with newly encoun¬ 
tered individuals. 

It is important to note that the increase in communicative 
success is the result of higher P values and not of another 
variable, such as B. Indeed, Figure 6 demonstrates the av¬ 
erage level of communicative success over twenty epochs is 
significantly lower for P — 1 or 2, as compared to P = 4; 
a trend that continues as P is increased. Furthermore, it can 
be seen from Figure 6 that higher P values allow for an in¬ 
creased amount of communicative success even when agents 
have significantly fewer training sessions (lower B values). 

However, in the real world, children are not just passive 
receivers of linguistic input. They interact with others, in¬ 


cluding other children, who may not yet be fully linguisti¬ 
cally competent. Thus, a number of model runs were con¬ 
ducted whereby immature agents had B bouts with P other 
immature agents while witnessing their parents bouts (Fig¬ 
ure 7). In other words, bouts are scheduled in a similar man¬ 
ner to that described above, in that we allow each agent to 
have one bout with a randomly selected partner, starting with 
a gent i and cycling through to agent before going 

back to agentl again. In these runs, mature agents only in¬ 
teract with mature agents and immature agents only interact 
with other immature agents. Although, immature agents still 
learn from their parent’s interactions, in the manner detailed 
above. 

Figure 7 demonstrates how, performance in the P = 10 
case is impeded by allowing interactions between immature 
agents. This is to be expected as linguistically underdevel¬ 
oped individuals interacting with one another will add a de¬ 
gree of noise into the communicative system. 

However, with P = 1, allowing immature agents to in¬ 
teract with one another dramatically increases communica¬ 
tive success. This difference in behaviour can again be at¬ 
tributed to signal diversity. While in the above results im¬ 
mature agents only learned from the interactions of their 
parents, meaning they got a degenerative sample of the lan¬ 
guage because they only ever witnessed the same two indi¬ 
viduals communicating during their first epoch, here they are 
also interacting and learning with another individual who is 
likely to have witnessed two different mature agent’s inter¬ 
acting with one another. This would increase the amount of 
signal diversity in the immature agents training data. 
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Figure 6: Graph depicting the average amount of communicative 
success over 20 epochs for various values of B and P. Where 
n — 20, Nm — 15, Ni = 15, B = 10 6 , and u — 1. Averaged 
over 30 runs. 

Discussion 

The results presented here build upon a larger body of work; 
both in regards to signalling conventions (Skyrms, 2004, 
2010; Barrett, 2006, 2009) and expression/induction mod¬ 
els research in general (Hurford, 2002). It has been shown 
that a signal can acquire a conventionalised meaning without 
the Sender intending for it to do so. Moreover, the meaning 
of such simple signals is dependent upon the stabilisation 
of usage conventions, which emerge from functional histor¬ 
ical signal production. Thus, even the most automatic or 
reflexive signals can acquire meaning, so long as the produc¬ 
tion and response mechanisms are co-adapted to coordinate 
their behaviours in accordance with such an arbitrary signal 
(Harms, 2004). 

More interestingly, it has been shown that a population 
structure that allows for interaction between more of its 
members is beneficial in allowing it to evolve an efficient 
term-based language. 

More specifically, it has been shown that, as intuition 
dictates, while it is harder to establish a conventionalised 
system of usage with larger numbers of individuals, doing 
so enables the emerging language to persist in subsequent 
epochs. This is due to the input into language learners being 
initially more diverse, which prevents these learners from 
developing a more idiosyncratic, communicative system that 
makes it harder to communicate with previously unencoun¬ 
tered individuals. 

In addition, the results and behaviour reported here can 
be seen to be linked to the concept of a linguistic bottleneck. 
This refers to how the input data for a language learner will 
only be a subset of the potentially large range of grammars 
of the Speaker from which it is learning. 

A series of computer-based simulations that use the 


Figure 7: Graph depicting the average amount of communica¬ 
tive success over 20 generations for P = 1 where only Nm in¬ 
teract with one another (blue solid line) and where both Nm and 
Ni interact with other mature and immature agents, respectively 
(blue dashed line), and likewise for P — 10 (red solid line and red 
dashed line, respectively). Where n — 20, Nm — 15, Ni — 15, 
B = 10 6 , and u — 1. Averaged over 30 runs. 

method of iterated learning have shown that the linguistic 
bottleneck is crucially important in regards to whether or 
not language can be successfully passed from one genera¬ 
tion to the next and, in situations where this transmission 
can be achieved successfully, show that it is also crucial to 
the linguistic structure that arises (Kirby, 2002b,a; Kirby and 
Hurford, 2002; Kirby et al., 2014; Smith, 2002; Smith et al., 
2003; Brace et al., 2015). 

Although a similar effect to the bottleneck is seen in other 
types of uni-generational models, such as the naming game 
(Steels, 1995), the model presented here is novel in that it 
demonstrates the impact of bottleneck-like behaviour in a 
generational-based simulation that explores term-based lan¬ 
guages. Here, this bottleneck-like behaviour takes the form 
of the way in which internal representations of individuals 
are induced from limited examples of the behaviour of other 
agents (Hurford, 2002). 

This supports other work that has demonstrated a link be¬ 
tween the linguistic bottleneck and the number of linguistic 
tutors (Brace et al., 2015). Indeed, the behaviour seen in 
Figure 7 indicates that the factors underpinning the cultural 
transmission of language change and linguistic variation are 
perhaps too complicated to be understood by analysing the 
nature of just inter- and intra-generational transmission; and 
that further research into linguistic change should focus on 
the nature of the social network the underpins linguistic pop¬ 
ulations (Wichmann and Holman, 2009; Lupyan and Dale, 
2010; Reitter and Lebiere, 2010; Milroy, 2013). 

This point becomes more important given that traditional 
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expression/induction models have largely ignored popula¬ 
tion dynamics so as to function on other aspects of language. 
Although, given the aims of such models, it made logical 
sense to opt for more simplistic population structures, it has 
been shown here that population dynamics can have a sig¬ 
nificant impact upon communicative behaviour. 

Indeed, it would be interesting to explore how an expand¬ 
ing and contracting population size, with varying numbers 
of mature language users and immature language learners, 
could impact the emergence and form of a language (Johans¬ 
son, 1997; Hurford, 2002). An expression/induction model 
geared towards this interest could provide valuable insights 
for a growing body of research that is interested in the nature 
of the relationship between language and population change; 
such as the impact of population size on linguistic forms or 
the way in which periods of linguistic simplification tend 
to coincide with periods where there are a higher number 
of language-learners within a population (Johansson, 1997; 
Nettle, 1999; Wichmann and Holman, 2009; Lupyan and 
Dale, 2010; Milroy, 2013; Trudgill, 2013). These themes 
will form the basis of future work. 
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Abstract 

We present a model for evolving agents using both genetic 
and cultural inheritance mechanisms. Within each agent our 
model maintains two distinct information stores we call the 
genome and the memome. Processes of adaptation are mod¬ 
eled as evolutionary processes at each level of adaptation 
(phylogenetic, ontogenetic, sociogenetic). We review rele¬ 
vant competing models and we show how our model im¬ 
proves on previous attempts to model genetic and cultural 
evolutionary processes. In particular we argue our model can 
achieve divergent gene-culture co-evolution. 

Introduction 

Evolutionary computation, a field that exploits the power 
of evolution, is a powerful tool for optimization, creativity 
and the study of evolutionary forces. Holland (1975) helped 
popularize evolutionary computation by applying the princi¬ 
ples of a basic evolutionary model to a computational task. 
Dawkins (1976) also considered the minimal requirements 
of evolution and proposed a simple model. 

Dawkins applied his model to the domain of human cul¬ 
ture introducing the field of memetics (see also Dennett 
(1995)). Dawkins suggests that as separate domains the 
realm of genetics and the realm of memetics both follow 
the same evolutionary principles. Theories of genetic and 
cultural co-evolution recognize that these two domains are 
not separate but parts of the same evolving system. While 
Dawkins’ simplified model addresses a single evolving sys¬ 
tem in isolation, we are interested in a model that incor¬ 
porates the interactions between genes and memes as both 
evolve. 

There have been many attempts to characterize the nature 
of cultural evolution coming from diverse motivations. For 
instance the term memetic algorithms now refers to a field 
of combinatorial optimization. Moscato et al. (1989) defined 
this new model in this way: 

Memetic algorithms is a marriage between a 
population-based global search and the heuristic 
local search made by each of the individuals. 


While the focus of Moscato was combinatorial optimization, 
where this model has proven of value, he has not created 
a model incorporating both genetic and cultural evolution. 
A model of cultural evolution must also incorporate an ex¬ 
change of learned information between individuals. In this 
sense Moscato’s model falls short of our aims. (It is worth 
noting that some implementations of memetic algorithms do 
incorporate an exchange of information.) 

In developing our own model of genetic and cultural evo¬ 
lution (Marriott and Chebib, 2014) we have considered the 
characteristics of an acceptable model. In our opinion any 
acceptable model will incorporate three modes of adapta¬ 
tion: phylogeneitc (biological), ontogenentic (individual), 
and sociogenetic (social). Phylogenetic adaptation is the 
well known adaptation of genetic material through natural 
selection, which also known as biological evolution. On¬ 
togenetic adaptation is adaptation of the individual over its 
lifetime and is often split into development (adaptation of 
morphology) and learning (adaptation of behavior). Socio¬ 
genetic adaptation is adaption of cultural information that is 
communicated through social learning mechanisms. 

It is clear that these three modes of adaptation will be 
coupled, that is, they will impact one another (Hinton and 
Nowlan, 1987; Sznajder et al., 2012). We think that any ac¬ 
ceptable model of genetic and cultural evolution must sup¬ 
port divergence in the genetic and cultural evolutionary tra¬ 
jectories. There are types of divergence possible. If the se¬ 
lection pressures on the genetic information and the memetic 
information pull in the same direction we see divergence in 
the speed of evolution. Cultural evolution is typically much 
quicker in this case. Yet when the selection pressures on ge¬ 
netic information and cultural information pull in different 
directions the model should allow the genetic information 
and the cultural information to diverge. 

In human culture we can see that this divergence can oc¬ 
cur in individual humans. For instance, a Catholic priest 
may swear an oath of celibacy because his culture rewards 
him for it. A Samurai may kill himself if he feels his cul¬ 
tural obligations have not been met. Refusing to reproduce 
and killing oneself are both acts that are contrary to the ge- 
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netic imperative but in these cases support the individual’s 
cultural imperative. 

Other more drastic cases occur when a whole culture 
adopts behavior contrary to the biological imperative. This 
includes the celibate religious sect the Shakers from the 
1770s and a number of mass suicides including Jonestown 
in 1978 and the Heaven’s Gate cult in 1997. The tragic end 
of these cultures is usually their own destruction. 

Other models of genetic and cultural evolution have been 
presented but very few satisfy all of our desired proper¬ 
ties. In this paper we will review evolutionary models and 
evaluate them as models of gene-meme co-evolution using 
the two principles outlined above. After reviewing relevant 
models we will present our model and elaborate on how it 
integrates benefits of multiple models and satisfies our aims. 
Finally we will discuss the potential for divergence of ge¬ 
netic and cultural evolutionary trajectories with our model. 

Prior Models 

Dawkins suggested evolution can occur in any population 
of information bearing agents so long as the population had 
three properties: heredity, variation, and selection. This sim¬ 
ple model of evolution we will call the basic evolutionary 
model. 

To demonstrate the applicability of this model to non- 
biological populations Dawkins coined the term meme to 
name the unit of cultural selection. He argued that popu¬ 
lations of memes satisfy the three properties and thus also 
undergo a process of evolution. 

Dawkins application showed that evolutionary principles 
can be applied to other domains. However his model did not 
describe the dynamics of a system undergoing both genetic 
and cultural evolution. In this section we will review the 
basic evolutionary model as presented by Dawkins and then 
review models that expand upon this model. Our goals is to 
evaluate these expanded models for their value as models of 
gene-culture coupled systems. 

Basic Evolutionary Model 

Dawkins’ evolutionary model is the basis of all the models 
we will consider in this section. The model exists as a set of 
minimal conditions for a system to evolve. The other models 
in this section are explorations of additional properties and 
mechanisms that enrich this basic model. 

As mentioned above the basic evolutionary model re¬ 
quires a population of agents. These agents must bear some 
type of information that is critical to their survival, and 
they must possess a means of replicating this information. 
Dawkins calls these types of agents replicators. 

A population of replicators is not sufficient. The popu¬ 
lation must also have three properties. When information 
is replicated it must be replicated with some degree of fi¬ 
delity, that is, it must be replicated within some reasonable 
error rate (heredity). The information among agents in the 
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Figure 1: Basic Evolutionary Model 

population must be varied (variation). Lastly, the informa¬ 
tion among agents in the population must determine which 
agents are pruned from the population and which agents can 
replicate (selection). 

At an abstract level this model makes little commitment to 
how replication occurs and what the rules for selection are. 
This means, for instance, that both natural selection and ar¬ 
tificial selection seem to satisfy the conditions. Nonetheless, 
the standard analogy is to biology so the basic evolutionary 
model tends to be characterized in terms of biological evo¬ 
lution. 

In the basic evolutionary model the information is en¬ 
coded in the genome (Fig. 1). One or more parents con¬ 
tribute information to a replication process that creates a new 
agent and genome. In the simplest model the information in 
the genome is directly referenced for the selection process 
(i.e. there is no interpretation of the information). 

Dawkins suggests that this model is the minimum re¬ 
quired for the force of evolution to occur, not that this is 
a complete or proper model of biological evolution. How¬ 
ever researchers have implemented this basic model many 
times in silico demonstrating that this minimal model can 
lead to evolution. Many researchers, including Dawkins, 
have expanded on this standard model to describe biolog¬ 
ical as well as cultural phenomena (Lumsden and Wilson, 
1981; Dawkins, 1982; Boyd and Richerson, 1983; Henrich 
and McElreath, 2003). 

Agent Based Models 

Agent based models are commonly used in biology and so¬ 
cial sciences for modeling phenomena involving populations 
of agents (Bonabeau, 2002; Epstein, 2006; Niazi and Hus¬ 
sain, 2011; Smaldino et al., 2012). A typical agent based 
model consists of a population of agents. These agents, as in 
the basic evolutionary model, bear information that is used 
to make decisions or select behavior as they interact with 
their environment and each other. 

Information is replicated in episodes of social interaction. 
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Figure 2: Agent Based Model 


Agents exchange information with each other. Selection of 
behavior results in better or worse performance in the envi¬ 
ronment and so the information can evolve over time. 

The information in an agent based model is now meant 
to represent learned information (not genetic) so we call it 
the memome (Fig. 2). While these models do not follow the 
standard biological characterization of the basic evolution¬ 
ary model we suggest they nonetheless are explorations of 
variations on the basic evolutionary model. 

Agents do not die and are not born in the simplest agent 
based models (some do incorporate this and some kind of 
genetic evolution but we would classify them in one of the 
categories that follow). 

These agent based mechanisms satisfy the three proper¬ 
ties of the basic evolutionary model. Information is copied 
with some fidelity in the replication process. Agents clearly 
have different information by the design of agent based mod¬ 
els. Third, as argued above, selection also operates in these 
models. Our conclusion is that agent based models are ex¬ 
plorations of the parameter space of the basic evolutionary 
model. 

The value of agent based models is unquestioned. 
Nonetheless most agent based models exemplify the basic 
evolutionary model with a non-standard (i.e. non-biological) 
interpretation. 

Horizontal Transfer Model 

Horizontal transfer of information has been suggested as a 
hallmark of cultural evolution (Gonzalez et al., 2014). Hor¬ 
izontal transfer models are a blend between the basic evo¬ 
lutionary model and the social mechanisms used in agent 
based models. The horizontal transfer model (Fig. 3) still 
relies on a standard parent to child information transfer dur¬ 
ing reproduction (since this transfer is unidirectional and al¬ 
ways passes from parents to children this is called a vertical 
transfer). 

The social mechanisms of agent based models also trans¬ 
fer information from agent to agent but these transfers are 
bidirectional and can typically occur anytime, not just dur¬ 



Figure 3: Horizontal Transfer Model 


ing instances of reproduction. In particular this means that 
during the lifetime of an agent it can change its internal in¬ 
formation through these social transfers. 

This transfer is called horizontal transfer because it can 
occur during the lifetime of the agent and can occur between 
agents of the same generation. In particular information 
can be transferred in any direction including parent-to-child, 
child-to-parent, sibling-to-sibling and in general, agent-to- 
agent. 

While horizontal transfer is a characteristic of cultural 
evolution we do not think that it is a sufficient characteris¬ 
tic. It can be shown that evolutionary models with horizon¬ 
tal transfer have some benefits over strictly vertical trans¬ 
fer models (Tomko et al., 2013). These simulated results 
are also backed by research on horizontal transfer of genetic 
material among bacterium, plants and fungi (Syvanen and 
Kado, 2001; Syvanen, 2012). The biological results stress 
that as horizontal transfer of genetic material does occur in 
the natural world we should treat horizontal transfer mod¬ 
els as models of biological evolution. That is, the horizontal 
transfer model described here is a valuable enhancement of 
the vertical transfer interpretation in the basic evolutionary 
model but it does not model the gene-culture coupled sys¬ 
tem. 

Evolutionary Developmental Model 

Earlier we introduced the field of memetic algorithms. The 
memetic algorithm model adds an additional stage to the 
naive evolutionary model. Agents are bred and born as in 
the basic model. However, the genetic information is not the 
information used for selection (as it is in the previous mod¬ 
els). Instead a local search is conducted around the genetic 
information for possibly better information. This informa¬ 
tion is instead used for selection. 

In biology this is called the genotype-phenotype distinc¬ 
tion and the process of mapping a genotype to a phenotype 
is called development (Hall, 2012). Development in biology 
is commonly split into morphological development (growth) 


502 




































































Figure 4: Evolutionary and Developmental Model Figure 5: Lamarck Model 


and behavioral development (learning). Adding both growth 
and learning to evolutionary simulations has been a com¬ 
mon improvement over the basic evolutionary models (Hin¬ 
ton and Nowlan, 1987; Sznajder et al., 2012; Marriott and 
Chebib, 2014). The biological model that best captures evo¬ 
lution and development is called the evolutionary develop¬ 
mental model or evo-devo for short. 

Fig. 4 shows the standard evo-devo model. The genome 
interacts with the environment through a process of develop¬ 
ment to produce the phenotype. The phenotype is a second 
store of information that is used in selection. However if re¬ 
production occurs, it is only the genetic information that is 
passed on. 

This makes a distinction between the information trans¬ 
ferred in transfer events and the information used in evalu¬ 
ation for the purposes of selection. In the basic model this 
was the same information. In the evo-devo model we sepa¬ 
rate these two kinds of information into two different infor¬ 
mation stores as well as provide rules for how to develop a 
phenotype from a genotype. 

A variant of the evo-devo model that is commonly found 
in memetic algorithms and other simulations corresponds 
to Neo-Lamarckian evolutionary theory. The primary dis¬ 
tinction between Neo-Lamarckian and Darwinian evolution 
is that learned traits can be passed on in Neo-Lamarckian 
models. Fig. 5 shows a Neo-Lamarckian evo-devo model in 
which the information in the phenotype is transferred during 
reproductive events instead of the genotype. 

A Neo-Lamarckian evo-devo model can be implemented 
with only a single information store (genetic) in which de¬ 
velopment is internal changes to the genome through inter¬ 
action with the environment. When reproductive events oc¬ 
cur the current state of the genome is transferred. In these 
models the information added through development can be 


passed on in reproduction. 

Evo-devo models are well studied in biology and are pop¬ 
ular evolutionary models for simulations and optimization. 
These models capture phylogenetic and ontogenetic adapta¬ 
tion but do not model sociogenetic adaptation. 

Evolved Social Learning Neural Networks 

The models reviewed so far have been variants or expansions 
of the basic evolutionary model. Most have clear biological 
instances in the natural world. While all model important 
processes, none succeed at modeling the interplay between 
genetic and cultural information. 

In our opinion the best attempts to model a coupled ge¬ 
netic and cultural system through simulation so far have 
come from researchers trying to evolve neural networks that 
also engage in phases of social learning (Gabora, 1995; 
Denaro and Parisi, 1997; Baldassarre, 2001; Smith, 2002; 
Acerbi and Parisi, 2006; Curran and O’Riordan, 2007; Borg 
et al., 2011). While these experiments have had varied levels 
of success we believe this type of model is on the right track. 

Agents in this model have a genome that encodes an ar¬ 
tificial neural network (typically the weights of a predeter¬ 
mined network topology). Evolution of this information is 
carried out following the basic evolutionary model. 

However, during the lifetime of the neural network the 
network can engage in learning. A basic type of neural net¬ 
work learning is backpropagation learning. Backpropaga- 
tion training requires a supervisor and most environments 
are not designed to supervise learning. In these models other 
networks provide the expected output to the learning net¬ 
work in a stage of social learning. Note that there are other 
possible means of training a network like neuromodulated 
plasticity (Soltoggio et al., 2008). 

The primary advantage of this model is that there are two 
distinct information stores for genetic and cultural informa- 
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Figure 6: Dual Inheritance Model 


tion. The genome is inert and is only used to build the initial 
network. The weights of the network are stored indepen¬ 
dently (from the genome) and can be seen as a second in¬ 
formation store. This store changes over the lifetime of the 
agent and is active in selecting behavior (i.e. in generating 
the phenotype that is relevant to selection). 

Despite properly modeling the relationship between ge¬ 
netic and cultural information these models suffer from 
drawbacks due to the choice of artificial neural networks. 
Artificial neural networks still represent a very simple model 
of biological neural networks. Evolving ANN weights and 
topology is very challenging despite new advances like the 
NEAT algorithm (Stanley and Miikkulainen, 2002). Trans¬ 
ferring information from one ANN to another using super¬ 
vised training is slow, unreliable, prone to errors, and arti¬ 
ficially requires training sessions. We pick up where these 
experiments leave off by clearly describing the model and 
presenting a different implementation choice that is easier 
to work with than neural networks. 

The Dual Inheritance Model 

The dual inheritance model exploits the advantages of the 
evolved social learning neural networks. Genetic and cul¬ 
tural information are stored separately and perform separate 
roles. We pivot from neural networks and embrace evolu¬ 
tionary processes. That is, we model phylogenetic, ontogen- 
tic, and sociogenetic adaptation as populations of individu¬ 
als competing for fitness. 

At birth a new agent inherits its genome from its parents 


(Fig. 6). Through a process of development the genome 
produces the newborn’s memome. The genome remains in¬ 
ert over the lifetime of the agent while the memome is ac¬ 
tive in behavior selection and adaptive through learning. The 
memome interacting with the environment creates the phe¬ 
notypic information which is active in selection. The mem¬ 
ome is modified through interaction with the environment 
(learning) and through interaction with other agents (social 
learning). 

We have implemented this model once before (Marriott 
and Chebib, 2014). Here we describe relevant implementa¬ 
tion details from our current system (Marriott and Chebib, 
2016b,a). Agents in our simulation exist in a random ge¬ 
ometric network of food sites. During the day they spend 
energy moving around, foraging for food, breeding, learn¬ 
ing and social learning. At the end of the day their food is 
converted to energy. An agent that runs out of energy dies, 
and one that stores enough surplus energy can reproduce. 

Genome 

The genome of an agent represents a path of sites through the 
random geometric network. At each site the gene determines 
what strategy to use to gather food, and whether to engage in 
breeding, learning or social learning at that site. The entire 
genome represents a very long path that cannot be completed 
in one day. 

During an agent’s lifetime the genome plays two roles. It 
must create the memome and during reproductive events it 
is used in recombination. Details of our agent’s genetics can 
be found in (Marriott and Chebib, 2015). 

If an agent has a surplus of energy and is prepared to breed 
then it must spend time breeding during the day. It does this 
at a particular site at a particular time of day. If there is 
another agent also performing the breed action at the same 
site at the same time then sexual reproduction occurs. If not 
the agent must wait for its next opportunity to breed. 

Upon birth the genome copies short segments of itself into 
the memome. Each segment represents a path through the 
network long enough to be completed in a single day and 
starting at a specific site. We call these segments meme- 
plexes and we copy every possible memeplex from the 
genome into the memome during initial development. We 
consider this technique is similar to the MAP-elites Mouret 
and Clune (2015) strategy for multi-objective optimization. 
We want to find the best memeplex given a particular start¬ 
ing site so we store the best memeplex for each starting site. 

Memome 

The memome is a collection of memeplexes. During behav¬ 
ior selection an agent selects the best memeplex given the 
current site. First an agent gathers the memeplexes that start 
at this site. The agent then selects the memeplex with the 
highest expected resource reward at the lowest energy cost. 
This is the agent’s behavior for the day. 
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The memome is not inert over the lifetime of an agent. A 
newborn agent has memeplexes that are directly copied from 
the memome. Over time new memeplexes are added to the 
memome through individual learning and social learning. 

Individual learning in our model occurs only if the agent 
spends time engaged in learning during the day. This re¬ 
quires the agent to select a memeplex for the day in which 
the agent spends time learning at at least one site. If it does 
so then the memeplex will add a possibly mutated copy of 
itself to the memome. This allows the agent to, among other 
things, optimize its foraging strategies. 

Individual learning is a process in which the memome can 
improve itself via interaction with the environment. This is 
a developmental process that is analogous to the one from 
the evo-devo model above. In our experiments agents with 
only individual learning can improve their behavior with this 
mechanism but these improvements are lost when the agent 
dies. 

Social learning is the important additional feature our 
agents need to achieve cumulative cultural evolution. In or¬ 
der for an optimized memeplex to survive the death of its 
host it must be shared with another host. Social learning in 
our implementation is similar to breeding. For two agents 
to learn from one another they must both perform the so¬ 
cial learning action in the same site at the same time during 
the day. If they do they exchange a possibly mutated copy of 
the memeplex that they used that day. We treat this exchange 
as roughly approximating the agents telling each other what 
they did that day. 

Individual learning is the power that allows the agent to 
optimize its behavior. This optimization is lost if it is not 
shared. Social learning is the power that allows a population 
to share optimized behaviors. The collection of all agents’ 
memeplexes is called the memosphere. 

A memeplex’s evolutionary goal is to maintain copies of 
itself in the memosphere. To do this it must be optimized 
and so this usually means it spends time learning. Further 
more it is critical that it spend time social learning so that it 
can spread itself to other agents. Note that a memeplex is 
not concerned with spending time breeding. Breeding does 
not help the memeplex spread or optimize. This will be a 
source of divergence. 

Divergence 

We have observed divergence of trajectories in both our prior 
work and in the model described here. There are two types 
of divergence that we have observed. First both the genome 
and the memeplexes are trying to optimize themselves. The 
memeplexes have many opportunities to improve while the 
genome only has an opportunity when it reproduces. This 
means that the memeplexes will optimize more rapidly than 
the genome. Both trajectories are in the same direction but 
one gets there much faster. We call this divergence un¬ 
der cooperative selection pressures. Fig. 7 summarizes re- 



Figure 7: Contrasting the average genetic optimization and 
average memeplex optimization over time for two popula¬ 
tion of agents. Breeders are a control group using the evo- 
devo model. Socializers follow the dual inheritance model. 

suits from our current implementation (Marriott and Chebib, 
2016b). 

The second type of divergence occurs when the selection 
pressures are competitive. We mentioned above that breed¬ 
ing does not help the memeplex. In fact it hinders it as 
time is better spent foraging than breeding and the meme¬ 
plex needs to be as optimal as possible. Selection pres¬ 
sure against breeding in the memosphere is strong and we 
see many optimized memeplexes that spend no time breed¬ 
ing. Contrary to this selection pressure for breeding is very 
strong in the genome. Most agents have genomes that allo¬ 
cate a lot of time to breeding. This is a case of divergence 
under competitive selection pressures. 

This divergence has an interesting effect. Memeplexes 
that spend no time breeding will suppress breeding in the 
agents that select them. This can create a culture of celibacy. 
If this culture becomes dominant it runs the risk of wiping 
out the population. We do indeed observe this in our cur¬ 
rent implementation. 21 out of 100 runs ended with com¬ 
plete population collapse before 5000 days. This was not 
observed in our prior work. In (Marriott and Chebib, 2014) 
agents always have the chance to reproduce and so colony 
collapse was not possible for this reason. 

This divergence also occurs in both implementations rel¬ 
ative to learning and social learning. We observe that there 
is strong selection pressure for learning and social learning 
in the genome. This is due to the benefits these adaptive 
mechanisms grant the agent from an evolutionary perspec¬ 
tive. While there is selection pressure for learning and so¬ 
cial learning in the memosphere there is also selection pres¬ 
sure to optimize these processes as much as possible. This 
means that once learning no longer pays off it is also corn- 
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Figure 8: Contrasting the average time spent breeding, 
learning and socializing between two populations of agents. 
Breeders are a control group using the evo-devo model. So¬ 
cializes follow the dual inheritance model. 

monly eliminated. Social learning is commonly minimized 
as much as possible to ensure a very optimal memeplex that 
can still spread itself. Fig. 8 summarizes results from our 
current implementation (Marriott and Chebib, 2016b). 

This divergence leads to a number of phenomena we wish 
to explore in more detail. Young agents have behaviors dic¬ 
tated largely by their genome. This means they spend a lot 
of time breeding, learning, and social learning. In a mature 
culture at some point the young agent will learn optimized 
memeplexes from others and alter its behavior. It will often 
no longer spend time breeding or learning. In any case it 
will spend as little time as possible breeding, learning and 
social learning. When it engages in social learning it usually 
does not get a memeplex better than its current ones, instead 
it is sharing its collection of memeplexes with others. 

This causes a phenomena where young agents breed, 
learn and social learn more often than older agents. We ob¬ 
serve that this is the case also in humans. Young humans en¬ 
gage in a considerable amount of learning and social learn¬ 
ing. Young adults are more likely to have children than older 
adults. We hope to explore if this is an artifact of our imple¬ 
mentations or a feature of other dual inheritance models. 

Discussion 

We have incorporated the benefits of all of the discussed 
models into our own with an attempt to capture a model 
that includes both genetic and memetic co-evolution as ac¬ 
curately as possible. Our model has the vertical transfer of 
the biological model as well as the horizontal transfer of the 
cultural model while avoiding the drawbacks of a single in¬ 
formation storage that previous models suffer. 

The key benefit of additional information storage is that 


the two stores can diverge. This idea is already present in 
the evo-devo model. In this model the information of the 
genotype is passed on through reproduction while the infor¬ 
mation in the phenotype is used for selection. The benefit of 
this model (in both biology and simulation) is that the infor¬ 
mation in the phenotype can adapt over the lifetime to bene¬ 
fit passing on the information in the genotype of an agent. 

This has two-fold benefit. First, the phenotype is free to 
adapt to any circumstance facing the agent during its life¬ 
time. Second, the genotype is not disturbed in this adapta¬ 
tion and can be passed on intact to the next generation. 

We can consider this benefit as a divergence of adaptive 
trajectories between the genotype and the phenotype. The 
genotype can still focus on replicating itself while the phe¬ 
notype can focus on keeping the agent alive. 

One advantage to the genotype in this arrangement is that 
the phenotype has a limited lifespan. After the agent dies 
the adaptations in the phenotype are lost and cannot upset 
the genotype’s goal of replication. In this case, the phe¬ 
notype is subordinate to the genotype and can only adapt 
within the boundaries dictated by the genotype. While these 
information stores can diverge from one another we say the 
phenotype in these models is tethered to the genotype. It 
can diverge but no further than the genotype allows (Mar¬ 
riott and Chebib, 2014). 

Our model adds a new layer of information: the mem- 
ome. The memome is also distinct from the genome and so 
can evolve on its own trajectory. Thanks to social learning 
it can avoid the death sentence of the phenotype. When the 
agent dies, if it has spread its cultural information to another 
agent, its memes can still live on. Unlike the information in 
the phenotype that exists tethered to the genotype, the mem- 
otype is free to evolve along its own trajectory. 

The information in the memome is not completely free 
of the genome. In human culture, and in our simulations, 
the existence of memes is still dependent upon the existence 
of the agents that house them. These agents are biological 
and thus if the memome diverges so far as to endanger the 
genome it may endanger itself as well. So when we discuss 
divergence we do not mean a complete decoupling of genetic 
and cultural systems, but rather a very long leash. Cultures 
are able to destroy their host but in so doing they destroy 
themselves as well. 

It is worth noting that our model does not expand on the 
implicit model of evolved social learning neural networks 
mentioned above. The implicit model in these experiments 
is identical to our model. One improvement we have made 
in terms of implementation is to make better choices of un¬ 
derlying structures and mechanisms. In particular, for sim¬ 
plicity, we have modeled the genome and memome as stor¬ 
ing the same type of information. As a result our learning 
and social learning processes mimic the underlying genetic 
mechanisms of mutation and recombination. This makes 
modeling and implementing these processes much simpler 
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and we believe this contributes to the success of our agents. 
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Abstract 

Divergent cumulative cultural evolution occurs when the cul¬ 
tural evolutionary trajectory diverges from the biological evo¬ 
lutionary trajectory. We consider the conditions under which 
divergent cumulative cultural evolution can occur. We hy¬ 
pothesize that two conditions are necessary. First that genetic 
and cultural information are stored separately in the agent. 
Second cultural information must be transferred horizontally 
between agents of different generations. We implement a 
model with these properties and show evidence of divergent 
cultural evolution under both cooperative and competitive se¬ 
lection pressures. 

Introduction 

Social learning is a form of learning that arises from social 
situatedness (Lindblom and Ziemke, 2003) and is character¬ 
ized by agents interacting with one another in order to learn. 
Social learning can accelerate learning beyond that of indi¬ 
vidual learning strategies (see Marriott and Chebib (2014); 
Marriott et al. (2010)) and most notably is its ability to 
support cumulative cultural evolution (Whiten et al., 2011; 
Mesoudi et al., 2006; Henrich and McElreath, 2003; Boyd 
and Richerson, 1996). Cumulative cultural evolution is an 
adaptive process in which each generation can make im¬ 
provements on the learned information inherited from their 
parents’ generation (Dean et al., 2014; Kempe et al., 2014). 

Genetic evolution and cultural evolution are parallel pro¬ 
cesses that optimize information in a population. It is com¬ 
mon to consider the interaction between these parallel pro¬ 
cesses. Two effects have been well discussed with respect 
to learning: the hiding effect is when learning shields genet¬ 
ics from selection pressure, thus slowing the evolutionary 
process, and the Baldwin effect is when learning stimulates 
genetics, increasing particular selection pressures, and thus 
speeding up evolutionary adaptation (Sznajder et al., 2012; 
Paenke et al., 2006; Baldwin, 1896). These effects help de¬ 
scribe the interaction between these parallel processes when 
they cooperate to improve fitness. 

Another way we can compare genetic and cultural evolu¬ 
tion is according to the direction of the evolutionary trajec¬ 
tory. It is possible for the cultural evolutionary trajectory to 


diverge from the biological evolutionary trajectory. In par¬ 
ticular, this means that the culture may evolve in directions 
that are neutral or even detrimental to the biological impera¬ 
tives of an agent or its genes. The biological imperatives of 
genes are merely survival and reproduction (Dennett, 1995; 
Dawkins, 1976). A divergent culture may be one that im¬ 
pedes an agents’ abilities to survive and/or reproduce. 

A simple non-human example of divergent evolutionary 
trajectories is sexual selection. Females could select for 
traits that correlate with fitness. In this case the culture and 
the genetic evolution agree. However, females could select 
for traits that do not correlate with fitness or correlate nega¬ 
tively with fitness. In certain birds of paradise the power of 
female selection has frustrated the fitness of males. Techni¬ 
cally, this is not a case of cultural divergence since in sexual 
selection divergence occurs between two genetic evolution¬ 
ary trajectories (male and female) within a single species. 

Some human cultures may frustrate the reproductive or 
survival capabilities of some of their members (usually for 
the apparent benefit of the culture). For instance, a Catholic 
priest will abstain from reproducing according to the rule of 
his culture. Also, a Samurai might kill himself out of shame 
for failing to meet his cultural obligations. These cultural 
practices impede the genetic imperative of the genes in these 
individuals. 

An extreme and rare case of cultural divergence would be 
a case where every individual of the culture engages in detri¬ 
mental cultural practices. This would include cases of mass 
abstinence or mass suicide. Mass abstinence was a cultural 
belief of the Shakers (in the 1770s-1780s). Some mass sui¬ 
cides are caused for fear of capture by an enemy culture (as 
in Masada, Israel in 74 CE or in Pilenai, Eithuania in 1336 
CE). Others are caused by a cultural belief that the suicide 
will grant reward in the afterlife (as with the Heaven’s Gate 
mass suicide in 1997) or avoid punishment in this life (as in 
Jonestown in 1978). 

We believe divergent cultural evolution requires at least 
two properties. First, genetic and cultural information must 
be stored in separate information stores. This rules out mod¬ 
els with horizontal transfer of genetic material. Second, hor- 
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izontal transfer of cultural information occurs between indi¬ 
viduals of the same generation or across generations. This 
property rules out evolutionary development models where 
phenotypic information is not transferred between individu¬ 
als. 

We believe these are necessary conditions and we believe 
they are probably not sufficient conditions. It is difficult to 
test this hypothesis since our implementation has many other 
implicit conditions that may play an important role. Our 
experiment is designed to test whether these conditions (plus 
implicit others) can lead to divergent cultural evolution in 
our population. 

In Marriott and Chebib (2014) we demonstrated a sim¬ 
ple instance of divergent genetic and cultural evolution in a 
population with these properties. The experiment involved a 
simple optimization problem, asexual agents, and no spatial 
environment. Agents in that experiment showed accelerated 
optimization and divergence of selection pressures for par¬ 
ticular genes and memes. 

We have reproduced this experiment in a virtual environ¬ 
ment representing real space. Our agents engage in sexual 
reproduction and are subject to natural selection. Our first 
experiments in this environment involved simple agents with 
no learning capabilities (Marriott and Chebib, 2015a,b). We 
have augmented these agents with individual and social 
learning mechanisms. 

Divergent Cultural Evolution 

In Marriott and Chebib (2014) we implemented a simple 
proof of concept and demonstration of divergent cumula¬ 
tive cultural evolution. Agents in our model engage in all 
three modes of adaptation: phylogenetic, ontogenetic, and 
sociogenetic. Phylogenetic adaptation is adaptation by ge¬ 
netic evolution. Ontogenetic adaptation is lifetime adapta¬ 
tion. Sociogenetic adaptation is the result of exchanging 
learned material between agents. 

The dual inheritance model (see Fig. 1) describes how 
these three modes of adaptation interact to create the agent 
(Marriott and Chebib, 2016b). Genetic information is in¬ 
ert over the lifetime of the agent in our model. It is trans¬ 
mitted vertically (from parent to child) during reproductive 
events and is responsible for the creation of initial cultural 
(memetic) information that is active during an agent’s life. 
Memetic information is used during the lifetime of the agent 
to select a behavior for a particular situation and this infor¬ 
mation is also adaptive. 

As agents are units of selection in our simulation natural 
selection occurs on the lifetime behavior of an agent (i.e. its 
phenotype). This behavior is determined by an interaction 
of an agent’s genome, memome and environment. As a re¬ 
sult both genetic and memetic information is important in 
determining if an agent lives and reproduces. In our model 
memetic selection also occurs. However, this selection is 
carried out by agents when they select what information to 



Figure 1: The dual inheritance model. 


use, what information to share and whether or not to share 
their information. 

There are two important ways that divergence can oc¬ 
cur between genetic and cultural evolution. It is common 
that evolutionary trajectories in both the genetic and cultural 
realm are aligned. This is common when they are both try¬ 
ing to optimize a behavior. In these cases it is expected 
that cultural optimization of the behavior will outpace ge¬ 
netic optimization primarily due to the different the different 
timescales of these adaptive mechanisms. The only diver¬ 
gence here is in terms of the rate of optimization. We call 
this divergence under cooperative selection pressures. 

The second type of divergence occurs when genetic selec¬ 
tion pressures and cultural selection pressures are contrary. 
For instance, sexual reproduction is favored by genetic se¬ 
lection but suppressed in many (human and non-human) cul¬ 
tures. We call this divergence under competitive selection 
pressures. 

We believe that both types of divergence require the prop¬ 
erties stated above. That is, genetic and cultural information 
must be separate and cultural information must be transmit¬ 
ted horizontally. We will test our implementation for both 
types of divergence. 

Model 

We have improved on our proof of concept by placing our 
agents in an environment in which they compete for re¬ 
sources and are subjected to a form of natural selection (i.e. 
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Figure 2: A typical gene consisting of gathering, non¬ 
gathering, and travel components. 


compete for mates) instead of artificial selection (i.e. face a 
fitness function). 

Our agents live in a random geometric network of re¬ 
source sites (Penrose, 2003). Random geometric networks 
are an approximation of two dimensional physical space. 
At each site agents can spend time gathering the resources 
available at that site. Sites in our current model have one, 
two or three resources available to an agent that gathers at 
that site. Agents in our model have genetically or memeti- 
cally encoded strategies for gathering at a site and the strat¬ 
egy determines the energy cost to the agent. The energy cost 
is always at least the number of resources available at that 
site (one, two or three) and at most five. 

Agents have a simple metabolism in which resources are 
converted into energy. Energy is used to move around 
the environment, gather resources, and perform actions like 
breeding, learning and social learning. Additional small 
daily energy penalties are administered for idle activity, old 
age, and length of genome (only during reproduction). At 
the end of each day an agent has a net gain or loss of energy 
and this contributes to whether the agent lives or dies and 
whether it has enough energy to reproduce. 

Genome 

As mentioned above, in the dual inheritance model an 
agent’s genome is inert during its lifetime and therefore is 
not adaptive nor directly active in behavior selection. The 
primary purpose of its genome is to spread genetic informa¬ 
tion in reproductive events. The secondary purpose of its 
genome is to produce an agent’s memome upon its birth. 

A genome of an agent represents a path of resources sites 
in the random geometric network. At each site on this path 
is also encoded possible behaviors for an agent at that site. 
That is, a genome represents a single long path through the 
network and the actions an agent might take at each site. 

We call each site in this path a gene in the genome. A 
typical gene consists of three parts. A gathering component 
encodes a strategy for gathering resources at the respective 
site. A non-gathering component encodes the energy spent 
on non-gathering actions like breeding, learning and social¬ 
izing. Energy in our model correlates with time (except in 
reproduction). In general, more time (energy) spent breed¬ 
ing, learning or socializing increases the likelihood of these 
activities being successful (more on this below). Finally, a 
gene has a travel component which encodes the energy cost 


of traveling to the next site in the path. 

Breeding in our model occurs by sexual reproduction so 
breeding is a social activity. In our environment two agents 
must be at the same site at the same time to breed. If both 
agents are performing the breed action for overlapping pe¬ 
riods of time then a successful sexual reproduction occurs. 
We can see that more time spent breeding during the day 
will increase an agent’s chances of reproducing. However, 
spending time breeding comes at an energy cost to the agent 
as well so it can’t afford to spend all of its time breeding. 

During sexual reproduction an offspring’s genome is cre¬ 
ated as a recombination of its parents’ genomes. Recom¬ 
bination uses the longest common subsequence of the two 
genomes. The offspring’s genome also has an opportunity 
to mutate in this process (see Marriott and Chebib (2015b) 
for more details of genetic mechanisms). Only in these cases 
is a genome active during an agent’s lifetime. 

At birth each agent’s genome creates a memome. A mem¬ 
ome consists of a collection of memeplexes. Each memeplex 
in our model represents a possible set of activities for a sin¬ 
gle day. Memeplexes are subsequences of a genome. During 
memome generation we start at each gene in a genome and 
we copy gene by gene into a memeplex. This continues un¬ 
til the total energy of a segment approaches the maximum 
daily energy for an agent. If copying the next gene would 
exceed the maximum energy the segment is complete and 
the memeplex is stored. Memeplexes are stored in the mem¬ 
ome along with other memeplexes starting with the same 
initial site for behavior selection (see below). 

Additionally, segments are copied in a backwards direc¬ 
tion from every gene. This means every gene in a genome is 
responsible for two memeplexes in its memome except the 
endpoints that are responsible for only a single memeplex. 
Notice that since every site in the environment is not neces¬ 
sarily represented in a genome there may be sites that do not 
have corresponding memeplexes. 

Memome 

Our agent’s cognitive model is inspired by the pandemonium 
model (Jackson, 1987; Franklin, 1997; Marriott et al., 2010). 
Each memeplex is a sub-path of a genome and thus is a path 
in the random geometric network. A memeplex represents a 
single day’s worth of activities. In the pandemonium model 
a memeplex is referred to as a daemon. Daemons compete 
for control of an agent and in our model memeplexes com¬ 
pete for control of an agent. 

Behavior selection is also quite similar to the MAP- 
elites strategy for multi-objective evolutionary optimization 
(Mouret and Clune, 2015). We have memeplexes organized 
in the memome based on the starting site. An agent’s day be¬ 
gins by selecting all memeplexes in its memome that begin 
at the agent’s current site. Recall that these memeplexes rep¬ 
resent a full day’s worth of activities. The memeplex from 
this set that rewards the maximum (expected) resources for 
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the day while minimizing the energy expenditure is selected. 

This is the primary force of cultural selection in our cur¬ 
rent implementation. It means that memeplexes with max¬ 
imum resource-to-energy ratio are selected as behaviors. 
Since in our social learning mechanism agents that engage 
in social learning share only the current day’s memeplex it 
means this selection mechanism is also used to select which 
memeplexes are shared during social learning. 

An agent can engage in individual learning only if its se¬ 
lected memeplex includes at least one meme that has a non¬ 
zero learning component. This means that an agent must 
spend time engaged in learning at at least one site during a 
day. When this occurs an agent will clone its memeplex for 
the day and apply a mutation. The new memeplex is added 
to its memome. This allows an agent to possibly generate 
a memeplex that is more efficient at the same activities or 
generate an alternative sequence of activities. 

We can see that having non-zero learning components in 
memes would benefit the agent. However, spending time 
learning during the day comes at an energy cost, as with 
breeding. Further, as our current implementation only al¬ 
lows a single learning event in a day it is not beneficial for 
an agent to spend more than the minimum amount of time 
learning. 

An agent can engage in social learning once per day. The 
process for social learning is very similar to the process for 
sexual reproduction. Social learning can only occur if an 
agent spends time engaged in a social learning action at least 
one site during a day. However, for social learning to occur, 
another agent must also be at the same site at the same time 
engaged in social learning. If this occurs then the two agents 
will swap mutated copies of the memeplexes they used for 
that day. We treat this mechanism as roughly equivalent to 
telling the other agent what they did for the day. 

Again we see a benefit to having social learning in memes 
as this will increase the chances of an exchange of meme¬ 
plexes. However, as with learning and breeding, an agent 
cannot afford to spend too much time performing social 
learning during a day. 

Both of these learning mechanisms allow for new meme¬ 
plexes to be added to the memome which means an agent 
can adapt its behavior. When it begins its day at the same 
site again it may select one of the new memeplexes instead. 

Experimental Setup 

We conduct three similar experimental runs with agents 
of different capabilities. The first control group we call 
breeders, and while they still have memomes, their learn¬ 
ing and social learning mechanisms have been turned off. 
In these agents, learning and social learning components 
of genes/memes are still present but inactive. The sec¬ 
ond control group we call learners. They are similar to 
breeders as they still lack social learning mechanisms, but 
they have their individual learning mechanisms intact. Like 


the breeders, they still have social learning components of 
genes/memes but they are inactive. The third group is our 
experimental group. We call them socializers and they have 
all the functionality described above. 

Each run is seeded with one hundred randomly generated 
agents. All genes in the randomly generated genome have 
learning and social learning components initialized to zero. 
Since it would be impossible for agents to breed if this were 
true of breeding components, we instead have a chance of 
initializing breeding components to non-zero values. As a 
result, agents must mutate learning and social learning in or¬ 
der to take advantage of these mechanisms. We allow our 
simulations to run for 5000 days. Under these settings every 
initial population is viable although some of our socializer 
simulations terminate early due to a catastrophic colony col¬ 
lapse (see below). 

We gather data on many aspects of our agents’ lives. 
In particular we gather information on the proportion of a 
genome or memeplex devoted to breeding, learning or so¬ 
cial learning. Recall that genomes and memeplexes are both 
paths of sites and they can vary in length from agent to 
agent. We can measure the length of a genome or meme¬ 
plex in two ways. We can count the number of genes/memes 
(representing sites) or we can measure the energy cost of a 
genome/memeplex as a whole. We record the proportion of 
a genome/memeplex devoted to breeding as the total energy 
devoted to breeding in a genome/memeplex over the total 
energy cost of the genome/memeplex. We do the same for 
learning and social learning. 

We use a slightly different method to calculate the opti¬ 
mization of a genome/memeplex. For each site there is a 
gathering action. Recall the action will cost energy at least 
the number of resources rewarded at a site (one, two or three) 
and at most five. We can count all energy used above the 
minimum as wasted energy. For each genome/memeplex we 
compute the average wasted energy per gene/meme. 

The control groups of breeders and learners should not 
display divergent cumulative cultural evolution. Among the 
control groups we expect differences in genome and meme¬ 
plex measurements but we expect these differences to be 
small. In the socializers we expect to see evidence of di¬ 
vergent cumulative cultural evolution. We expect meme¬ 
plexes will show evidence of greater optimization than the 
genomes, at least after enough time for cumulative cultural 
evolution to emerge. Recall this is a case of divergence un¬ 
der cooperative selection pressures. 

We also expect to see divergence under competitive selec¬ 
tion pressures (i.e. when they pull in different directions). 
We expect that genetic selection will select for breeding 
while also having indirect selection for learning and social 
learning. We expect that there will remain some selection 
pressure for social learning and learning in the memeplexes, 
but the pressure to optimize these actions out of a memeplex 
will also be strong. Breeding is not to the advantage of the 
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Figure 3: Generation over time for breeders and socializ¬ 
es. Generation is defined as the maximum generation in the 
population. A child’s generation is one greater than the max 
of its parents’ generations. 

memetic selection mechanism so we expect it to be selected 
against by memetic selection. 

Observations and Discussion 

Our experiment was replicated 130 times on a variety of ran¬ 
dom geometric networks. All data presented in this section 
is averaged over these 130 runs. 

We wish to begin with a discussion of an apparent slow¬ 
down of genetic evolution caused by cultural evolution. 
When we first investigated the breeding, learning and social 
learning genes in the genomes of socializers we found that 
gene concentrations for these components grow at the same 
rate for our two control groups but at a slower rate for the 
socializers. We thought this could be due to a hiding effect 
occurring between cultural evolution and genetic evolution. 

We saw a similar effect on genome length over time and 
other data we gathered. However with further investiga¬ 
tion we were able to determine the source of the slowdown. 
In both control groups the generation of agents increased 
at identical rates. The generations of socializers increased 
at about half the rate (Figure 3). This is due to an emer¬ 
gence of eusocial breeding culture in our agents (Marriott 
and Chebib, 2016b). 

We think this evidence suggests that cultural evolution can 
slow genetic evolution over time, but possibly not over gen¬ 
erations. That is, the shielding of genetic selection pressure 
is not really there. Instead there is a selection pressure for 
longer generations which has the result of slowing genetic 
evolution over time. 

To confirm this we plotted gene concentrations over gen¬ 
erations instead of days (Figure 4). We can see that the so¬ 
cializers actually have weak acceleration of evolution over 



Generations 


Figure 4: Concentration of breeding, learning, and socializ¬ 
ing genes in the genomes of breeders (dotted) and socializers 
(solid) over generations. 



Figure 5: Average wasted energy per gene/meme in the 
genomes and memomes of breeders and socializers. 

time according to this plot. These differences are small. We 
suspect that a greater significant difference might occur if 
we allowed our simulation to run for more days. 

Now we wish to consider evidence for divergence under 
cooperative selection pressures. So we turn to a discussion 
of the relative optimization of genomes and memeplexes. 
Breeders and socializers showed a slight trend to less op¬ 
timized genomes over time (Figure 5). This is not unex¬ 
pected as much of the genome is not actually used during 
the agent’s lifetime and thus is not subjected to selection at 
all. Further the genomes of agents increase overtime so this 
increases the size of the unused region. This has a result of 
stagnation of optimization in the genome except for a very 
small region. We can see in the control groups that this re- 
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Figure 6: Concentration of breeding, learning, and socializ¬ 
ing memes in the memomes of breeders (dotted) and social¬ 
izes (solid). 

gion is indeed optimized. We see this in the measurement 
of optimization of the daily memeplexes. The memeplexes 
selected for activity are more optimized than the genome as 
a whole. The memeplexes in the control runs undergo an 
early stage of optimization before stagnating. 

In the socializers there is also an early stage of optimiza¬ 
tion before stagnation. However the stage of optimization 
is considerably greater in socializers than non-socializers. 
Stagnation in the control runs is in part due to a weak genetic 
selection pressure for optimization. Selection pressure for 
optimization in the memome is stronger and most evolved 
memeplexes are highly optimized (nearing zero wasted en¬ 
ergy). However the data shows an average of all agents in the 
population. Only older agents have the evolved memeplexes 
and the younger agents have memeplexes that are still just 
close copies of regions of their genomes. When we average 
over all agents we will never optimize to zero. 

Now let’s consider divergence under competitive selec¬ 
tion (see Fig 6). The strongest competitive pressure is rela¬ 
tive to breeding actions. Treating memeplexes as a type of 
parasitic organism we can see that they are only concerned 
with their replication into new hosts. They are not concerned 
with their host’s reproduction, even if this leads to fewer 
available hosts in the long run. A common end for this type 
of parasite is to die off after killing all available hosts. 

We observe that soon after social learning emerges in the 
population memeplexes diverge from the genome. Breeding 
actions in a genome continue to be selected for, but breeding 
actions in a memome are selected against and their concen¬ 
tration decreases before stagnating. Notice again this stag¬ 
nation is due in part to the averaging over the population. 
Some agents still are young and have a higher concentration 
of breeding actions than more evolved memeplexes. 


We observe that in many socializers, breeding actions in 
their memeplexes are reduced to zero. This is also clear by 
noting that about 54% of breeders and learners have chil¬ 
dren while only 36% of socializers do. Memeplexes co-opt 
the agent for their reproductive ends instead of the genome’s 
reproductive ends. 

Interesting cases of collapsed colonies occur when these 
memeplexes spread to all agents before they can breed. In 
many runs of socializers we witness isolated colonies com¬ 
pletely dying out. Out of 100 runs 21 ended when all agents 
died out before the 5000 day limit is reached. This only oc¬ 
curs in socializers. 

In breeders and learners, colonies can face extinction due 
to a shortage of resources caused by overpopulation. In these 
situations agents can’t get enough resources to reproduce 
and in some cases can’t get enough to survive. However this 
situation cures itself as agents die out. As agents die, they 
no longer collect resources. The resources can instead go to 
the young and eventually young agents can reproduce. This 
causes cycles in population density but never a collapse. 

Consider learning and social learning actions. There is 
strong early selection pressure in genomes and memomes 
for these actions. However as social learning kicks in, 
memetic selection against wasting time on these action takes 
over. Remember it is beneficial to spend as little time as 
possible on learning and so optimized memeplexes spend 
energy learning at only a single site. Many optimized meme¬ 
plexes do not spend energy on learning at all. This is a detri¬ 
ment but if a memeplex is already highly optimized there is 
little benefit to learning. Further if a memeplex still spends 
time social learning an agent can still be adaptive. 

It is also beneficial to not waste time on social learning. 
As above when social learning begins optimization places a 
negative selection pressure on social learning actions. How¬ 
ever it is a memeplex’s responsibility to spread itself. A 
memeplex that evolves to spend no time on social learning 
will never be spread (but may still be useful to an agent). 
Thus, we usually observe at least one site in a memeplex 
with social learning time and often more. This means that 
a memeplex usually has a particular site at which it spreads 
itself to other agents. This builds cultures of agents around 
meme spreading sites. 

We notice for all breeding, learning and social learning 
actions as well as for optimization, cumulative cultural evo¬ 
lution causes a divergent effect from genetic evolution. In 
our simulations, this period of time occurred early on (in 
the first 500 days) and then social learning maintains an op¬ 
timized culture through spreading evolved memeplexes to 
new agents. 

To track cumulative cultural evolution we also assigned 
a generation to each memeplex. All original memeplexes 
created at birth are assigned generation zero. Whenever it 
is cloned in learning or social learning the clone is assigned 
generation one higher than its parent. 
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Figure 7: Average and maximum generation of active 
memeplexes over time with one standard deviation around 
the mean. 

We can detect cumulative cultural evolution by detecting 
an increase in memeplex generation over time, especially 
from generation to generation (Figure 7). We can indeed 
confirm cumulative cultural evolution in our socializers by 
this method. Of course breeders can only have a memeplex 
generation of zero. Learners however can have a meme¬ 
plex generation above zero if they learn better memeplexes. 
However these memeplexes can never leave their initial host 
and die with the host. Therefore, no cumulative culture can 
accrue (Kempe et al., 2014). We do see that learning agents 
can maintain a low non-zero memeplex generation but not 
one that increases over time. 

Interestingly, in the socializers we do notice two stages in 
memeplex generation growth. The initial stage is more rapid 
and the second stage grows slowly at a fixed rate over time. 
We note that the period of rapid growth coincides with the 
period in which the memeplexes are being optimized prior to 
stagnation. The slower growth rate coincides with the stage 
where agents pass around optimized memeplexes. 

Conclusions 

Our implementation incorporated two information stores, 
one for genetic information and one for cultural information. 
It also had mechanisms of horizontal transfer of cultural in¬ 
formation between agents of multiple generations. Our im¬ 
plementation is an example of our dual inheritance model of 
cultural evolution (Marriott and Chebib, 2016a). Our imple¬ 
mentation demonstrates divergent cumulative cultural evo¬ 
lution under both conditions of cooperative and competitive 
selection pressures. 

We can see that populations of agents that participate in 
the dual inheritance model can accelerate optimization rela¬ 
tive to selection pressures that are cooperative between the 


genetic and memetic world. This is due to the potential for 
many cultural generations in a single biological generation. 
Thus, optimization can occur much faster over real time in 
cultural evolution than in biological evolution. Further, since 
the selection pressure on both the genome and the memome 
operate in the same direction there cannot be conflict be¬ 
tween these pressures. The only divergence in this case is in 
terms of speed to convergence. 

We also see cases where the selection pressure on the 
genome operates in an opposite direction for the memome. 
Genes and memes care only for spreading themselves. For 
genes, spreading occurs through reproductive events but for 
memes spreading occurs through social learning events. So 
it is not surprising that genes that increase the success of 
reproductive events are selected for by biological evolution 
and memes that increase the number and success of social 
learning events are selected for by cultural selection. 

The contrary is not necessarily true. Genes that increase 
the number and success of social learning events may be 
selected for by biological evolution if social learning also 
helps improve survival or reproductive success. This is true 
in our simulation. Memes that increase the number and suc¬ 
cess of reproductive events have only a distant and indirect 
impact on the number and success of social learning actions. 
Since they also have a detrimental effect on the optimization 
of the memeplex there is a considerably stronger selection 
to avoid these actions. 

Finally, we still see an interesting divergence in behav¬ 
ior of young (inexperienced) agents and old (experienced) 
agents. Young agents have had very little or no time to adapt 
their initial set of memeplexes either through individual opti¬ 
mization or through learning from others. Thus their behav¬ 
ior is still largely determined by their genome, which may 
also be the case in humans (Tomasello, 2016; Csibra and 
Gergely, 2011). This means they are more likely to breed, 
learn and social learn than old agents because all of these 
actions occur in much higher concentrations in the genome 
than in the memome. 

The interesting impact of these trends is that young agents 
are more likely to be parents (i.e. before they learn bet¬ 
ter). They are more likely to learn from the environment 
more than older experienced agents. Finally, they are also 
more social. They are more likely to seek out social learn¬ 
ing events than old agents. We find this conclusion inter¬ 
esting for two reasons. First, we see an emergent organiza¬ 
tion in our populations around age. Second this organization 
mimics the same organizations in other models and natural 
populations (Lehmann et al., 2013; Thornton and Malapert, 
2009). Young humans are more likely to have children, more 
likely to attempt to improve themselves through learning, 
and more likely to seek out the knowledge of others than 
their older counterparts (Demps et al., 2012; Hewlett et al., 
2011). Further older humans that engage in social learning 
are more often teachers than learners and this is also born 
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out by our experiment. 

Finally we believe that the observed divergent cumulative 
cultural evolution is due to a critical component of the dual 
inheritance model. Specifically we think it is critical to keep 
genetic and cultural information separate from one another 
even if they store the same kinds of information (as in our 
implementation). Without separate information stores there 
is no environment for divergence to occur. Secondly it is im¬ 
portant that cultural information can be transmitted between 
members of the same generation and between generations. 
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Abstract 

The ability of European badgers to establish communal 
latrines at their territory boundaries is a well-known but 
poorly understood example of group-level biological organ¬ 
isation. To what extent might we expect it to arise via self¬ 
organisation rather than as the result of specific adaptations? 
This paper replicates and extends a model of badger forag¬ 
ing and territoriality to include defecation, “faecotaxis” and 
overmarking behaviours, and shows that communal boundary 
latrines arise spontaneously through stigmergy in both terri¬ 
torial and non-territorial badgers, with no need for specific 
cognitive or behavioural adaptations such as spatial memory, 
or individual recognition. The model suggests that faecotaxis 
and overmarking behaviours are necessary for boundary la¬ 
trine formation, that culling has little effect on the prevalence 
of faecal sites (implicated in the spread of bovine tuberculosis 
in the UK), and that the spatial micro-structure of the envi¬ 
ronment is significant to the self-organisation process. 

Introduction 

One key Artificial Life research question is understand¬ 
ing the extent to which living systems result from self¬ 
organisation (Kauffman, 1993; Goodwin, 1994) or adapta¬ 
tion (Dawkins, 1986). Here we develop a simple, spatially 
extended model of species-environment self-organisation to 
better inform our expectations regarding the spatial patterns 
that analogous living systems are capable of. 

Many animal and insect species establish and use com¬ 
munal toilets. From ants (Czaczkes et al., 2015) to lemurs 
(Irwin et al., 2004) and rhinoceroses (Freeman et al., 2014), 
there are many examples of organisms preferentially leav¬ 
ing faeces, urine and other waste in areas dedicated for this 
purpose. The function of these waste chambers, latrines and 
middens is not always limited to sanitation, but can involve 
defence (where latrines are used to mark territory bound¬ 
aries) and communication (where scent marking at latrines 
is used to signal information about, e.g., mate quality). 

The European badger, Meles meles, is the only British 
mammal that uses purpose-built toilets (other than humans). 
They establish communal latrines within their territories and 
at territory boundaries (Roper et al., 1993). Biologists be¬ 
lieve that communal boundary latrines may play a commu¬ 


nicative role, allowing each group of badgers to communi¬ 
cate with other nearby groups, sharing information on the 
status of each clan (e.g., Stewart et al., 2002). 

While there have been studies of latrine use in badgers 
(e.g., Roper et al., 1986) and models of territoriality in bad¬ 
ger populations (e.g., Stewart et al., 1997), there is little un¬ 
derstanding of how communal latrines arise, how their spa¬ 
tial distribution is influenced by badger foraging behaviour 
and territoriality, the extent to which specific adaptations are 
required in order to establish and maintain them, and how 
their use and distribution might be affected by changes in 
population size. This last issue is particularly timely given 
recent attempts to control the spread of bovine tuberculo¬ 
sis from badger faeces to cattle in the UK through culling of 
the badger population (Indepedent Scientific Group on Cat¬ 
tle TB, 2007; Donnelly and Woodroffe, 2015). 

The current paper presents a simple null model of com¬ 
munal latrine formation and uses it to establish how readily 
communal latrines arise in a population of simulated bad¬ 
gers, to identify which factors influence the spatial distribu¬ 
tion of latrines, and to understand the effect of simulated 
culling. The model confirms that communal latrines can 
arise spontaneously when badgers exhibit a minimal form 
of faecotaxis and overmarking behaviour, that they tend to 
self-organise along territory boundaries for territorial bad¬ 
gers, and points equidistant between setts for non-territorial, 
free-roaming badgers, and that culling is not, prima facie, an 
effective way to limit the distribution of badger faeces. 

The paper goes on to explore the role of spatial embedding 
in facilitating communal latrine formation, demonstrating 
that shared boundary latrines can arise in a system that has 
no spatial structure within each territory. These results raise 
the possibility that the model might be used to shed light on 
territorial social media behaviour exhibited by communities 
of humans (Williams et al., 2015), where online activity at 
sites frequented by multiple communities may play a similar 
defensive or communicative function to that of deposits left 
by badgers and other mustelids, despite social media activity 
taking place in a non-Euclidean virtual space. 
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The model 


The model of badger foraging and latrine formation that will 
be developed and presented here is an extension of a previ¬ 
ous simulation model of badger foraging and territoriality 
(Stewart et al., 1997). This model did not include defeca¬ 
tion or latrine formation, but was developed to explore the 
hypothesis that territoriality in Meles meles arises as a con¬ 
sequence of Passive Range Exclusion (PRE), i.e., badgers 
from adjacent clans tend to avoid each other’s home range 
merely as a consequence of optimizing their foraging effi¬ 
ciency, since food gradients tend to peak at or around home 
range boundaries. 

The PRE model represents the badgers’ environment as an 
array of hexagonal cells, organised into seven close-packed, 
non-overlapping hexagonal territories. Each territory com¬ 
prises a central cell containing a sett (the underground home 
of a clan of badgers), surrounded by r rings of cells (see Fig¬ 
ure 1). (While the original PRE model did not allow badgers 
to move beyond the outer edges of the outer territories, in 
our replication we allow the environment to be a hexagonal 
torus in which the outer cells are adjacent to the appropriate 
cells on the opposite side of the environment.) 

Each model run simulates n consecutive nocturnal forag¬ 
ing periods of T timesteps each. Initially, B badgers are al¬ 
located to each sett. All badgers in a sett have the same strat¬ 
egy. During each foraging period, each badger leaves its sett 
and moves around the environment consuming the food that 
it discovers. By the end of each foraging period, each clan 
of B badgers will have returned to their sett where they will 
remain inactive for T timesteps until the beginning of the 
next nocturnal foraging period. At the start of each forag¬ 
ing period food is redistributed throughout the environment. 
Every cell that is not a badger sett has a probability Pp of 
being allocated F units of food, and a probability 1 — Pp of 
being empty. At the end of each nocturnal foraging period 
any remaining food is removed from the environment. 

During each foraging timestep, badgers that are not busy 
feeding move simultaneously to an adjacent cell (they may 
not enter the cell that they last occupied or any cell that con¬ 
tains a sett). When badgers move onto a cell containing food 
that is not already being eaten, they start to consume it, tak¬ 
ing a handling time of h timesteps before they continue to 
forage in other cells. A badger moving onto a cell containing 
food that was already being eaten by another badger or bad¬ 
gers on the previous timestep ignores the food. If b badgers 
move simultaneously onto the same cell, and this cell con¬ 
tains food that is not already being eaten, the food is shared 
equally amongst the b badgers, with each badger receiving 

units of food for each of h timesteps. 

When the shortest distance from a badger to its home sett, 
d, is equal to the number of time steps remaining in the cur¬ 
rent nocturnal foraging period, it ceases foraging and pro¬ 
ceeds to move in the direction of its home sett (even if this 
means leaving food uneaten or moving into a cell that it oc- 



Figure 1: Seven badger territories, each comprising a central 
sett (open circle) surrounded by r = 10 rings of cells. Filled 
circles indicate cells containing food (Pp = 0.3). Grey 
edges represent transitions between territories. 

cupied on the previous timestep). 

The PRE model considered three badger strategies: 

• Free-Roamers may move to any adjacent legal cell when 
foraging, ignoring considerations of territoriality. 

• Overlappers must remain within a distance d < r + w 
of their home sett, i.e., they may move up to w cells 
into an adjacent territory. Overlappers generalise the PRE 
model’s Territorial Badgers where are Overlappers with 
w = 0. 

• Boundary-Walkers are badgers that behave as Free- 
Roamers until they first reach a cell at the perimeter of 
their own territory (i.e., d = r), after which they must for¬ 
age within a distance w of their territory’s boundary (i.e., 
r — w < d < r + w) until they return home towards the 
end of the foraging period. 

Extending PRE 

Here, after replicating the key results of the PRE model (see 
Figure 2), we extend it in order to model latrine formation by 
including (i) defecation , (ii) a minimal form of “ fcecotaxis ” 
(taxis towards sites of recent defecation), and (iii) a simple 
form of overmarking behaviour (the tendency to defecate on 
top of previous faecal deposits). 

For each badger, we define the number of timesteps 
elapsed since the badger’s last defecation, p. Badgers are 
initialised with p = T to represent the period of inactivity 
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immediately before the first nocturnal foraging period, and 
the value increases by one during each foraging timestep, 
and by T during each period of daytime inactivity. Each 
badger’s p counter is reset to zero when it defecates. 

In order to implement a tendency for badgers to move 
towards nearby sites of defecation we implement a biased 
random walk, favouring adjacent locations with more faecal 
deposits. We set the strength of this bias to increase linearly 
with p , the length of time since a badger’s last defecation. 
Instead of moving a badger to a cell selected at random from 
the set of all legal adjacent cells, we select from a random 
subset of these cells the cell with most faecal deposits from 
a ran, breaking any ties at random. 

Where the set of legal cells adjacent to a badger is L, we 


consider a random subset of 


\L\ 


<T,p) 


T+l 


1 cells from 


L. Consequently, immediately after defecation a badger 
will move to a random legal adjacent cell (subset size is 1), 
whereas T or more time steps after defecation it will always 
move to the legal adjacent cell with the greatest number of 
faecal deposits (subset size is \L\). Note that this faecotaxis 
is minimal in that it only considers the cells immediately 
adjacent to a badger. 

The probability, Pd, that during a particular timestep a 
badger will defecate is defined as a sigmoid function of p, 
the time since last defecation, and /, the number of faecal 
deposits in the cell, parameterised by k, the steepness of the 
sigmoid, 0 , the position of the sigmoid mid-point, and a , a 
factor controlling the strength of the overmarking tendency, 
i.e., the influence of any faecal deposits in the cell: 


Pd — 1/(1 -h e kx ) wher ex=p —0 + af 

Simulations described here use: k = 0.25, 0 = 1 — 
(T/20), and a = T /20 in order to achieve a relatively sharp 
transition from low to high probability near the end of the 
day. 

At the start of each nocturnal foraging period, any fae¬ 
cal deposits older than zT timesteps are removed from the 
environment. For the results reported here, z = 5, which 
is consistent with estimations of how long deposits can be 
considered “fresh” (Roper et al., 1993). 


Results 

Replication 

The reimplementation confirmed that badger foraging be¬ 
haviour tends to establish a food density gradient that peaks 
at or around the boundary between adjacent territories (Fig¬ 
ure 2), ensuring that Boundary Walkers who spend most 
time in the boundary zone tend to forage more successfully 
and that this advantage increases with the initial density of 
food in the environment (Figure 3). 

The food density gradient results from the fact there are 
only a small number of heavily foraged cells near to a sett 
compared to the larger number of cells at the periphery 




Figure 2: Results from the original PRE model (top) agree 
closely with those from the reimplementation (bottom). 
Both plots depict the establishment, by Boundary Walkers, 
of a food gradient in the central territory running from the 
central sett to the territory boundary. The dip in food density 
that arises at the boundary after 80 timesteps results from the 
boundary-restricted movement of Boundary Walkers. The 
peripheral territories contain Territorial badgers. Gradients 
are plotted at 10 timestep intervals with r = 10, T = 100, 
P F = 1.0, F = 1.575, B = 20, h = 4, and n = 10 foraging 
periods. (While the meaning of original (top) y axis label is 
not clear, it is assumed to be a measure of food density along 
the lines of that used in the lower plot.) 


which are foraged less readily. Since it takes time for bad¬ 
gers to reach the boundary zone, it tends to remain a richer 
source of food for longer during each foraging period. 

However, Figure 4 demonstrates that overall foraging suc¬ 
cess is higher for Free-Roamers than for either Territorial 
badgers or Boundary Walkers. Badgers that have their Re¬ 
stricting movement tends to limit food intake because bad¬ 
gers more often re-enter cells they visited previously. A 
direct comparison of the efficiency of the different strate¬ 
gies was not reported by Stewart et al. (1997), but this result 
tends to undermine the passive range exclusion hypothesis 
which holds that badgers forage within a “territory” merely 
to achieve higher rates of foraging success. 

Latrines 

When the model is extended to include defecation, faeco¬ 
taxis and overmarking behaviours, badgers readily achieve 
communal latrines (defined here as those with >10 faecal 
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Figure 3: A comparison of the PRE model (dashed line) and 
the replication (solid line) in terms of the intercepts (cir¬ 
cles) and gradients (diamonds) of linear models regressing 
the amount of food consumed by an Overlapper on the time 
it spent in the boundary zone for different values of Pp. Pa¬ 
rameters as per Figure 2, save that F = 200 and n = 25. 
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Figure 4: Free-Roamers collect more food than Boundary- 
Walkers regardless of both clan size and initial food density. 
Model parameters are: r = 10, T = 100, F = 200, h = 4, 
and n = 25 foraging periods. Standard errors (not shown) 
are « 70 food units on average. 


deposits at the end of the final foraging period). These la¬ 
trines tend to appear at territory boundaries and also close 
to badger setts (Figure 5). Communal boundary latrines are 
achieved both by Overlappers (who may only forage up to 
w cells inside an adjacent territory) and by Free-Roamers 
(who are ignorant of any territory boundaries), although the 
communal latrines achieved by the latter strategy are not as 
tightly distributed at territory boundaries (Figure 6). 

Communal latrines at the sett result from the tendency 
for badgers to defecate at the start of each foraging period, 
whereas communal boundary latrines result from a combi¬ 
nation of (i) higher numbers of Overlapper or Free-Roamer 
badgers in the boundary zone, (ii) positive feedback result¬ 
ing from faecotaxis and overmarking, which encourages bad¬ 
gers to add to existing latrines rather than start new ones. 

Figure 7 demonstrates that cells foraged by two clans of 
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Figure 6: The proportion of cells containing communal la¬ 
trines at differing distances from the nearest sett for Overlap¬ 
pers (filled bars) and Free-Roamers (open bars). Over 95% 
of cells immediately adjacent to setts contained communal 
latrines (not shown). Figures depict data from the final state 
of 25 runs with parameters as per Figure 5. 



Cells Within a Cells Bordering Cells Bordering 

Single Territory Two Territories Three Territories 


Figure 7: Simulation results from Figure 5 a, replotted to 
depict the proportion of communal latrine cells within the 
hinterland of a single Overlapper territory, at the boundary 
between two territories, or at the intersection between three 
territories. Whiskers represent standard deviations. 


Overlapper badgers (i.e., cells along a territory boundary) 
are more likely to be the site of a communal latrine than 
cells foraged by one clan (i.e., cells within a territory), and 
that cells at the intersection of three territories are the most 
likely to become communal latrines. 

To what extent are communal boundary latrines depen¬ 
dent on the positive feedback brought about by faecotaxis 
and overmarking behaviour? Might only one of these be¬ 
haviours be necessary, or even neither of them? Figure 8 
shows Overlapper latrine distributions that arise when each 
of these behaviours is suppressed. In neither case are com¬ 
munal boundary latrines achieved at a significant rate, indi¬ 
cating that both behaviours are crucial (Figure 9). 
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Figure 5: Representative latrine distributions after n = 200 nocturnal foraging periods for (a) Overlappers, and (b) Free- 
Roamers. Setts are represented by open circles. Latrines are represented by filled circles. Larger circles contain more faecal 
deposits. Latrines closer to their nearest sett are represented by darker blue circles. Red circles represent latrines in the boundary 
zones between territories (which are toroidal at the periphery of the environment). Model parameters are: r = 10, T = 100, 
F = 200, B = 20, h = 4, P F = 0.3 and w = 1. 



Figure 8: Representative latrine distributions after n = 200 nocturnal foraging periods for Overlappers with either (a) no 
faecotaxis behaviour, or (b) no overmarking behaviour. Data visualization and model parameters are as per Figure 5. 
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Figure 9: The proportion of cells containing communal la¬ 
trines at differing distances from the nearest sett for Over¬ 
lapped with either no fecotaxis behaviour (filled bars) or 
no overmarking behaviour (open bars). Again over 95% of 
cells immediately adjacent to setts contained communal la¬ 
trines (not shown). Figures depict data from the final state 
of 25 runs with parameters as per Figure 5. 

Culling 

One route by which badgers might cause bovine tuberculo¬ 
sis in cattle is through contact between cows and infected 
badger faeces. In the UK, there have been attempts to reduce 
bovine TB by culling badger populations (Indepedent Sci¬ 
entific Group on Cattle TB, 2007), although the policy has 
been controversial (Donnelly and Woodroffe, 2015). Fig¬ 
ure 10 shows how the distribution of faecal deposits is af¬ 
fected by changes in clan size. While reducing clan size 
reduces the average number of faecal deposits at each site at 
which faecal deposits are present, the number of such sites 
is relatively stable. This is because the stigmergic feedback 
that tends to encourage badgers to defecate in the same place 
becomes weaker as the number of badgers is decreased. 
Consequently, when badger populations are small, although 
the total volume of faeces present in the environment is re¬ 
duced, it is distributed just as widely. Hence, in a simple 
model such as this one, culling is not an effective way to 
reduce the rate at which cattle encounter badger faeces. 

Eroding Spatial Constraints 

To what extent is the tendency for communal latrines to arise 
at territory boundaries dependent on the environment being 
spatially structured? In order to explore this question, we 
introduce a degree of “random rewiring” to the environment, 
eroding the spatial correlation structure within each territory 
and between each pair of adjacent territories. The rewiring 
scheme is as follows: 

1. Start with a graph, G, representing the original environ¬ 
ment structure in which every cell is represented by a node 
and every pair of adjacent cells is represented by an edge. 


Figure 10: Influence of clan size on the distribution of fe¬ 
cal deposit sites, i.e., cells containing at least one fecal de¬ 
posit at the end of a run. Bars represent the number of sites. 
Diamonds represent the mean number of deposits at each 
site. Data is averaged over 25 runs (whiskers represent stan¬ 
dard deviations). Runs with the same parameters as Figure 5 
are represented by the solid bar and empty diamond. Other 
bars/diamonds represent runs that differ only in terms of clan 
size. Reducing clan size lowers the average number of de¬ 
posits at fecal deposit sites, but reductions of as much as 
75% fail to result in a significant reduction in the number of 
sites. 

2. For each undirected edge, ij G G, with probability Pr 
remove it from G and add i and j to a list Lu where I 
and J are the territories of i and j, respectively. (Note that 
a list may contain the same node more than once.) 

3. Merge each pair of lists Lu and L jj, where / % J. 

4. For each list, until it is empty, repeatedly pick a random 
pair of nodes, i and j, from the list, ensuring that i % j 
and ij £ G, and add an undirected edge ij to G. 

This rewiring scheme is constrained to ensure that, whilst 
it erodes the local spatial structure within and between ter¬ 
ritories, it does not change the overall gross structure of the 
environment and does not introduce new biases that favour 
particular sites. If an original edge linked two nodes within 
the same territory then so will the new randomly rewired 
edge. If an original edge linked nodes from two differ¬ 
ent territories, then the new randomly rewired edge will 
link nodes from the same pair of territories. Consequently, 
while the micro-structure within and between territories is 
eroded by rewiring, the gross territorial structure of the en¬ 
vironment is maintained. This means that, amongst other 
things, the environment ceases to be a lattice with high clus¬ 
tering and long characteristic path lengths, and each sett is 
no longer equidistant from the boundary zones with adjacent 
territories. Moreover, the random rewiring process does not 
change the total number of neighbours that any node pos¬ 
sesses (i.e., the network’s degree distribution remains un- 
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Figure 11: The proportion of communal latrine cells within 
a single territory’s hinterland, at the boundary between two 
territories, or at the intersection between three territories for 
Overlappers and Free-Roamers in environments where spa¬ 
tial structure has been eroded by random rewiring (Pr = 
1.0). Model parameters are as per Figure 5. 

changed). This is important, since high-degree nodes will 
tend to experience higher traffic and will thus tend to have a 
higher chance of becoming latrine sites. 

Figure 11 indicates that, when environment are fully 
rewired (i.e., Pr = 1.0), while Free-Roamers no longer tend 
to establish communal latrines at territory boundaries, Over- 
lapper communal latrines continue to be over-represented in 
these boundary zones (although to a lesser extent than in 
unrewwired environments). Figure 12 shows example dis¬ 
tributions of Overlapper and Free-Roamer latrines in these 
fully rewired environments. 

Discussion 

In reality, both the behaviour and environment of the Eu¬ 
ropean badger are, of course, far more complex than the 
idealised simulation presented here. They spend time on a 
range of activities other than foraging and defecating, and 
their world amounts to more than a regular array of setts sur¬ 
rounded by a uniform random distribution of food. In partic¬ 
ular, there is evidence that badgers prefer to establish latrines 
along linear environmental features such as hedges or walls 
(Delahay et al., 2007), and that they commute directly to la¬ 
trine locations (Roper et al., 1986). It is in this sense that the 
simulation presented here is a null model —one that estab¬ 
lishes a set of basic expectations concerning the behaviour 
exhibited by a certain class of system, against which to eval¬ 
uate the actual patterns discovered in empirical data. 

The model demonstrates that communal latrines may arise 
as a result of foraging and defecation, but only if there are 
positive feedbacks (faecotaxis and overmarking) that amplify 
existing defecation sites. The model suggests that we might 
expect communal latrines to arise disproportionately often at 
territory boundaries, as a result of the increased traffic from 


multiple clans foraging in these areas. Moreover, communal 
latrines might arise in these same locations before territo¬ 
ries and territorial behaviour are established. This raises the 
possibility that communal latrines might shape territoriality 
rather than vice versa. 

The model suggests that reducing badger numbers will 
not tend to reduce the rate at which cattle encounter bad¬ 
ger faeces. However, introducing more sophisticated bad¬ 
ger behaviours to the model (such as a tendency to delib¬ 
erately visit latrines, rather than stumble upon them) may 
reverse these findings. Conversely, the model does not con¬ 
sider how the transmission of infection might be related to 
latrine size, with small deposits potentially posing more risk 
to cattle than large latrines that can be easily avoided. 

Finally, preliminary results suggest that there may be con¬ 
siderable potential for using a simulation model such as the 
one presented here to explore a wider set of stigmergic be¬ 
haviours in environments that are not embedded in physical 
space, such as human behaviour on the Internet (Williams 
et al., 2015). Questions might include: how and when might 
online social media sites that are utilised by multiple groups 
with different competing belief systems become platforms 
for effective communication between them? 

Conclusion 

We have demonstrated that communal latrines can arise 
at territory boundaries disproportionately often as a result 
of simple stigmergic behaviours (defecation, faecotaxis and 
overmarking) without the need for specific cognitive or be¬ 
havioural adaptations such as memory, spatial knowledge, 
agonistic interactions, deliberate latrine visits, or recognition 
of in-group or out-group individuals or their faecal deposits. 

Significantly, communal latrines mid-way between bad¬ 
ger setts are achieved by non-territorial badgers, suggest¬ 
ing that such latrines may precede territories (and even help 
to bring them about). While the establishment of bound¬ 
ary latrines is facilitated by the environment’s spatial micro¬ 
structure (i.e., correlation structure due to Euclidean geome¬ 
try), they can be established by territorial agents even when 
this micro-structure is eroded completely. 

Acknowledgments: Thanks to George Salter for prelimi¬ 
nary work on a replication of the PRE model, and to Henri¬ 
etta Wilson and Alex Beyfus for comments on a draft. 

A note on the title: The phrase “shit happens” is an idiom 
that means “some things occur without a specific reason”, 
and thus seems apposite here despite its vulgarity. 
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Figure 12: Representative latrine distributions after n = 200 nocturnal foraging periods for (a) Overlappers and (b) Free- 
Roamers in environments where spatial structure has been eroded by random rewiring (Pr = 1.0). Line segments represent a 
fraction of the connections between neighbouring cells (those that link cells located within 3 spatial units of each other) to avoid 
visual clutter (coloured links: cells belong to the same territory; grey links: cells belong to different territories). Otherwise, 
data visualization and model parameters are as per Figure 5. 
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Abstract 

This research presents computational models to represent visual 
navigation mechanisms which guide pigeons (Columba Livia) 
during flight. A 3D graphics computer simulator was developed 
to model autonomous flight in virtual pigeons. The aim was to 
investigate the role of (i) visual landmarks (ii) flocking with 
other pigeons and (iii) image familiarity in pigeon navigation. 

A recursive processing algorithm enabled landmarks and other 
pigeons to be located, identified and counted. Image processing 
could form a feasible mechanism for autonomous visual 
navigation by identification of familiar route headings. This 
could be used in autonomous flying drones or flight simulators. 

Introduction and History 

Pigeons (Columba Livia) were the first domesticated bird, 
around 6000 years ago. This long and important history of 
interaction with humans has enhanced their capability to 
communicate. Modem pigeons can understand hand signals, 
voice commands and recognise individual humans even when 
wearing different clothing (Brett et al. 2015). During 
domestication, the fastest and most reliable message carrier 
pigeons have been favoured for breeding which has produced 
faster, stronger pigeons which can fly further with enhanced 
homing ability. Homing is defined as the process of a pigeon 
navigating back to its loft after being released remotely - 
either independently or in flocks. Pigeons have been known to 
successfully navigate home to their loft from 7200 miles 
which is almost one third of Earth’s circumference. Racing 
pigeons commonly fly at speeds of up to 80 miles per hour, 
flying for 24 hours without stopping. 

Navigation mechanism in pigeons. Pigeon navigation 
mechanisms involve (1) the position of the sun in the sky 
relative to the hour of day, known as solar clock (2) sensation 
of the earth’s magnetic forces, known as magnetoception (3) 
visual recognition of landmarks especially for the region 
closest to the loft. The solar clock is preferred by the pigeon, 
however on cloudy or rainy days it is not possible to use the 
sun and pigeons then switch to navigation by magnetoception. 
This has been proven by releasing Time shifted’ pigeons (Biro 
et al., 2004). Their blacked-out loft had no sunlight and 
artificial lighting was used to offset sunrise and sunset by 2 
hours from the actual sunset. When released, Time shifted’ 
pigeons set off at the wrong angle, taking a long detour with 
increased chance of becoming lost, showing that pigeons 
navigate by solar clock when the sun is visible. However, on 


cloudy days the Time shifted’ pigeons were unaffected, flying 
home in a straight line, proving that when the sun is not 
visible, pigeons navigate by magnetoception. 

Breeders generally lose pigeons each year due to hazards 
including farmland shooting, colliding with high sided 
vehicles and power cables. Some losses are unavoidable but 
understanding navigation mechanisms could reduce losses in 
training caused by navigation issues or weather. 

Pigeons are an example of expert visual navigators but it is 
not known how their vision is optimised for navigation or 
how their optical array is optimised for view matching 
strategies to enable orientation and navigation. Between the 
eyes of various aerial animals, there is enormous variation in 
the information provided by visual systems (Gaffin et al. 
2015). Mechanisms of insect navigation have recently been 
investigated by computer simulation (Wystrach et al., 2015). 

Video recording during pigeon flight 

Small lightweight video camera devices have been mounted 
onto a pigeon during flight which revealed the bird’s eye view 
during pigeon navigation (Fig. 1). Results from analysis of 
these videos and real-time GPS tracking data during pigeon 
flights show that pigeons use visual navigation to follow 
roads, rivers, or coastlines which lead towards their loft 
during flights within familiar territory or within the last leg of 
a longer journey (Biro et al., 2004). These findings suggest 
that navigation within one mile of the loft is led by visual 
navigation of recognised landmarks. The outline of landmarks 
must be visible for pigeons to recognise them correctly 
(Gibson et al., 2015). This suggests that navigation by solar 
clock and magnetoception may not be accurate enough for 
locating the exact loft, but guides to within the correct mile. 
This would explain why pigeons become lost when released 
without first seeing and becoming familiar with the loft’s 
surroundings, which is a core part of training for homing 
pigeons. Within visual navigation (i) landmarks, (ii) flocking 
and (iii) familiar scenes play a part in guiding pigeons. This 
research aims to identify their roles in simulated navigation. 



Fig. 1. Video device mounted on a pigeon during flight to 
monitor visual navigation [National Geographic, 2011]. 
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Methods - Simulated Pigeon Visual Navigation 

Virtual Reality habitats were generated using the GLUT 
framework to represent environments encountered by pigeons 
in their natural habitat (Fig. 2). For pigeons this consisted of a 
random assortment of plants and trees on hilly ground with 
scattered areas of water. Plants and tree objects were created 
using random configurations based on geometric patterns. The 
size of trees in the virtual environment was based on the scale 
of objects as perceived by the pigeons in their natural habitat 
at flight levels between 0-100m. Plants and trees were 
rendered as 3D objects. A flock of 100 pigeons was 
simulated, each navigating independently. 

Pigeon’s eye view. At each iteration of the simulator, a bird’s 
eye view of each virtual pigeon in the sky was generated and 
stored as an image using perspective projection view from the 
position (latitude, longitude and altitude) within virtual world 
coordinates and, direction (elevation and azimuth) in degrees. 
The generated image proportions aim to match characteristics 
of pigeon eyes in terms of the degrees of panoramic view 
from the horizon. Various combinations of colour and 
resolution were investigated to represent pigeon’s eye 
characteristics and identify the effect on ability to navigate. 
The pigeon’s eye view images were used for (1) image 
processing to identify landmark features within the image, and 
(2) a neural network to assess familiarity of images. 



Fig. 2. Virtual Simulation of autonomous pigeon during flight. 


Image processing: landmark recognition. Real-time Image 
processing algorithms were developed to process the pigeon’s 
eye view images to identify (i) known landmarks and (ii) other 
pigeons to guide flocking behaviour. Recursive image 
processing was used to locate centroid of each object cluster 
(Fig. 3). A matrix represents all pixels within view. For pixels 
within the landmark RGB threshold, a recursive function tests 
the pixel values of four surrounding pixels (non-diagonal). 

Neural Network Training. A neural network was used to 
assess the familiarity of images from the pigeon’s eye view. 
The ANN was trained with stored images taken at regular 
intervals along a known training route through the habitat, 
from a familiar release site back to the loft (Wystrach et al. 
2015). Distance between stored training images represents the 
pigeon’s memory capability. The neural network required the 
familiar route to be clearly distinguishable from others. 


□ Landmark rm Background ■ Recursive Process 



Fig. 3. Clustering recursion algorithm to identify landmarks. 


Results and Performance Analysis of Recursion 

The recursive image progressing algorithm from a pigeon’s 
view allowed all pixels within landmarks to be identified. Size 
and centre of each cluster was calculated from sum of pixels. 
This enabled identification of the number, size and location of 
landmarks. This could enable navigation based on the location 
of known landmarks to recover accurate route headings from 
visual perception of landmark locations. The pigeons are 
required to learn a map of the landmark layout in relation to 
the loft. Learning the loft location relative to landmarks is an 
essential part of the training process. The number of other 
pigeons was identified by image processing, enabling flocking 
behaviour by responding to flight paths of other pigeons. 

Conclusion 

In 2016 there is increasing investment in autonomous 
navigation research. Beneficiaries include: driverless cars, 
flying delivery drones, autonomous robotics, game engines, 
auto-pilots and augmented smartphones. 

This research used virtual models of autonomous agents 
within virtual environments. Image processing was used to 
develop visual navigation based on optical performance of 
pigeons. VR can provide a platform to develop and test 
algorithms for visual perception and navigation responding to 
sensory information in real-time. 

The developed visual navigation algorithms from the 
simulator model could be transferred onto a drone with a 
microcontroller and camera to detect landmarks and other 
drones using image processing. This could enable a drone to 
use landmarks and scene familiarity for visual navigation and 
to flock or avoid collisions with other drones. 

Recent research generated preliminary results validating 
feasibility of image processing for autonomous visual 
navigation (Vaughan, 2015). Familiarity of images could 
enable autonomous agents to choose the optimum orientation 
and navigation towards a trained location. 
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Abstract 

A framework for predictively linking cell-level signaling with 
larger scale patterning in regeneration and growth has yet to be 
created within the field of regenerative biology. If this could be 
achieved, regeneration (controlled cell growth), cancer 
(uncontrolled cell growth), and birth defects (mispatteming of 
cell growth) could be more easily understood and manipulated. 
This paper looks to create a key part of this preliminary 
framework by using level set methods and a cellular control 
scheme to predict macroscopic regenerative morphology. This 
simulation specifically looks at Xenopus laevis tail regeneration, 
and uses three control regimes to collectively mimic biological 
regeneration. The algorithm shows promise in creating an 
abstracted model to predict cell patterning on a macroscopic 
level. 


Introduction 

If the control of cell growth and tissue patterning can be better 
understood, cancer (uncontrolled cell growth), birth defects 
(mispatteming), and organ regeneration (cell growth harnessed 
toward the repair of complex organs) could be more easily 
manipulated. While the molecular mechanisms of cellular 
control are increasingly understood, the field lacks frameworks 
for predictively linking cell-level signals to large-scale pattern 
controls. This paper looks to leverage methods from continuum 
mechanics to provide new tools for modeling the control of cell 
growth and patterning. To do this, regeneration in tadpole tails 
is modeled as an iteration between two processes. The first 
process is a control scheme, which decides where and when 
tissue should grow or shrink. The second process is a growth 
model that describes changes in tissue morphology due to cell 
division, motion, and growth. 
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Figure 1: Two-process system model. The control scheme 
creates local growth commands, which the growth model uses 
to outputs the spatial distribution of tissue. 


One challenge in modeling regeneration and tissue growth is 
relating the macro (organism) scale to the micro (cell) scale. 
Traditionally, as the scale of the organism increases, so does 
the computational expense of modeling its smallest features 
and interactions. The average human is composed of 37.2 
trillion cells - an unrealistic number of cells and interactions to 
model (Bianconi et al. 2013). Our proposed method is 
advantageous because it treats tissue as a continuum, blurring 
the boundaries between individual cells. This approach avoids 
the problem of managing cells as individual agents, which like 
marbles on a Chinese checkers board, would need to be 
shuffled to open spaces to make room for new marbles. 

What we describe is the difference between tracking 
individual agents and tracking the motion of bulk material 
through a fixed volume of space. Two types of mathematical 
thinking exist to distinguish these types of phenomena. The 
first approach is Lagrangian, meaning cells are tracked on an 
individual basis. Such a method allows cells to operate on their 
own growth rules and is commonly used in biological growth 
modeling (Walker et al. 2004; Rejniak and Anderson 2011). 
Lagrange models often suffer morphologically from internal 
voids because the individual “cells” cannot directly organize in 
a manner that preserves contact without overlapping. The 
second approach is Eulerian, meaning it focuses on the space 
through which particles move. Eulerian approaches are 
classically employed in modeling fluid flow and heat transfer 
(J. A. Sethian 1985; Osher and Sethian 1988). Such methods 
have not been widely used to model tissue growth and 
patterning; however, they offer great promise to capture the 
effects of microscopic phenomena interacting across 
macroscopic domains. 

Our proposed model uses an Eulerian approach based on the 
level-set method. Level sets use a modified mass balance to 
describe a moving boundary, such as the interface between an 
organism and the surrounding medium. Level set methods are 
used to model crystal growth and combustion, as well as for 
computer vision and microchip fabrication (Osher and Sethian 
1988; J. A. Sethian 1985; J. Sethian 1984; J. A. Sethian, n.d.). 
Level set methods have also seen some use in biological 
modeling (C.S. Hogea, Murray, and Sethian, n.d.) although not 
to our knowledge in a closed-loop feedback scheme for 
patterned growth as illustrated in Figure 1. 

Level set methods use a scalar field to describe a moving 
boundary. The boundary is at the zero values of the scalar field 
and motion of the boundary is determined by assigning a speed 
at each point. The speed function is ultimately what controls 
the development of the boundary. For this biology-motivated 
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application, we propose a speed function consisting of three 
main components: isometric control, patterning control, and 
smoothing control. 

This paper uses level sets to simulate the regeneration of the 
amputated tail of a Xenopus laevis tadpole. Xenopus is a simple 
vertebrate that regenerates its tail until early in its life cycle, 
through stage 52 or 53 (Suzuki et al. 2006). This makes 
Xenopus ideal for modeling patterning growth, and 
regeneration in particular, across a macroscopic scale. In this 
work, we conceptualize regeneration as growth that restores 
animal morphology back to a “reference” shape. Our particular 
approach will assume that a global reference map is available 
and that control laws act by setting growth rate based on the 
distance of the organism boundary from the reference. In fact, 
it is not critical as to whether an actual reference “map” might 
exist in an animal system (Friston et al. 2015) or whether the 
“map” is an emergent pattern that results from local control 
decisions (Zhang and Levin 2009; Chemet, Fields, and Levin 
2015). On the simulation scale, both mechanisms are 
functionally equivalent. 

In the following sections, this paper will detail methods used 
to implement the growth model and the control scheme. The 
paper will then go into detail on the simulation used to assess 
these models. Finally, we will present results, conclusions, and 
future work. 


Methods 


Level Set Analogy 

Level set methods were originally created to model combustion 
and two-phase flow (J. Sethian 1984). A level set can be 
conceived as a geographic contour map, where each level set is 
an elevation contour that consists of the set of points at a 
particular elevation. These contours may change over time if 
geography changes. By following a particular contour in time, 
it is possible to model the motion of an interface (e.g. a flame 
front in combustion or a liquid-gas interface in two-phase 
flow). The level set equation is directly related to the scalar 
transport equation. Instead of transporting material however, 
the level set method transports a scalar distance from the 
boundary of interest. 

To make the geographic analogy more concrete, consider a 
particular case - a volcanic island rising out of the ocean 
(Figure 2). In this example, we will track sea level over time, 
as this elevation marks the interface between the island and the 
ocean. As a volcanic eruption takes place and adds material 
over the entire island, the island’s elevation map will evolve 
both on the land side (topography) and on the ocean side 
(bathymetry). The addition of new material will cause the 
island to grow, such that the sea-level elevation contour pushes 
outward as shown in Figure 2. Erosion would gamer the 
opposite effect - sharp features would be worn down, shrinking 
the boundary. In Figure 2, the elevation is denoted by the scalar 
(|>. The interface between land and water is denoted by the sea- 
level contour, with 4> = 0. 

This geographic concept can be extended to biological 
modeling. In this paper, we consider a two-dimensional model 
of the Xenopus tail. In our model, the x coordinate corresponds 
to the anterior-posterior direction, the y coordinate to the 
dorsal-ventral direction, and the lateral direction is not 
modeled. The level set field <|)(x,y) is now used to represent 
distance away from the outer surface of the organism. The 


contour that represents the outer surface of the organism is 
labeled T and represents the set of all points where § = 0. 
Moving inside the organism, the distance-from-contour is 
measured as positive and so § is set positive inside the 
organism. By contrast, <|> is set negative outside the organism. 
This mathematical approach lets us track the motion of the 
outer surface of the organism through space as the organism 
grows in time. An illustration of this concept is shown in Figure 
3, which depicts a section of the Xenopus tail. A photo of the 
Xenpous tadpole, in Figure 3(a), shows the tail prior to 
amputation. The outer tail surface can be identified and used to 




Figure 2: (a) Top view of the volcanic island empting - 
arrows indicate boundary movement, (b) Front view of the 
volcanic island describing the same boundary movement. 
Elevation above sea level is indicated by the variable 
Addition of volcanic material increases elevation and 
translates directly to change in island circumference. 





Figure 3: Level Set Scalar Field Derivation (a) Xenopus 
laevis at stage 42. (b) A 2D view of the tail representation. 
This represents the morphology of the tail and is the zero 
level set contour of the scalar field, (c) A 3D representation 
of the level set field. The flat plane is the zero level set. 
Above that plane represents material inside the body; 
while, the section below that plane is outside the body. The 
black line represents the current zero contour. 


529 







generate a binary image, as shown in Figure 3(b), where the 
green region indicates the interior of the organism and the light 
region indicates the exterior. This tail region can also be viewed 
as a level set field, shown in three-dimensions in Figure 3(c). 
The height of the field indicates the distance of each (x,y) point 
from the outer surface of the organism. 


Growth Model 


This section describes the level set equations we used to model 
organism growth. Begin by identifying the outer surface of the 
organism in a Cartesian space described by coordinates x and 
y. In two-dimensions, the outer surface of the animal is a 
contour, which we label T as shown in Figure 3(b). Now 
construct a scalar field §(x,y) that represents distance from the 
contour T, with values increasing interior to the organism 
(<|> > 0) and decreasing exterior to the organism (4> < 0). In 
order to represent distance, note that the gradient of this scalar 
field must be one at all points where the slope of the field is 
continuous. Slope discontinuities appear only at the center of 
the field, where points are equidistant from multiple sections of 
the T contour, as illustrated by the ridge that appears along the 
midline of the tail, as illustrated in Figure 3(c). 

To model growth, we evolve the scalar field § over time. As 
time advances, the level set is propagated using a velocity field 
v(x,y), where the velocity vector is specified at every point in 
the field. The magnitude F of the velocity vector will be set by 
the control scheme, as described below. As the control scheme 
transform the scalar field, the outer surface of the organism T 
moves in time, representing organism growth. 

The following equation governs the time dynamics of the 
scalar field c|)(x, y, t). 

^<P = S(x,y,t) (1) 

Here the full derivative of the scalar 4> is related to a source term 
S. The source term allows for the production of new material 
(or the destruction of old material) at every point in the field. 
Where the source term is zero, there is no change in the total 
amount of material present; in other words, § is conserved in 
the absence of a source term. Equation (1) is a classical 
conservation law from continuum mechanics, as might be used 
to model the conservation of mass, momentum, or energy 
(Kundu, Cohen, and David R. Dowling Ph.D. 2011). 

The full derivative is linked to velocity v through the 
following equation. 

^0(x,y,t) = <p t + V0 • v (2) 


This equation, obtained from standard calculus using the chain 
rule, uses the notation 0 t to identify the partial derivative of </> 
with respect to time and the notation V0 to identify its spatial 
gradient. In this work, (j) contours are assumed always to move 
outward in the direction normal to each existing contour. The 
local unit normal to each contour is defined by the gradient: 


The velocity vector can be written in terms of a magnitude 
term F multiplied by this local normal. 


v = —F n 


(4) 


The negative sign is introduced here, so that a positive speed 
F corresponds to organism growth (toward lower values of (j>). 



V 


Figure 4: Velocity normal to level set contour. All 
movement is normal to each point on the contour. Only the 
magnitude of F determines contour movement. 

A field in which velocity is locally normal to the contour is 
shown in Figure 4. 

Substituting equation (4) into equation (2) gives 

£0(*,y.t)= 4>t-Fm\\ (5) 

Recalling that the gradient is equal to one at all points where it 
is defined, the full derivative becomes 

^- t 4>(x,y,t)= 4> t ~F (6) 

where the gradient Vcf> is continuous. To avoid issues with 
the gradient being undefined at some locations (at cusps and 
ridges in the </> field), the velocity magnitude F is restricted to 
be zero at these locations. Thus, by combining equations (1) 
and (6), we obtain the following equation to describe the 
change in the level set field ^ at each point and at each moment 
in time. 

_ (F + S, V0 defined 
^ 1 S, elsewhere ; 


Because this equation for propagating the (j) field behaves 
differently in regions where the gradient is either continuous 
or not, it is natural to decompose our solution approach into 
two parts. In the first part of the solution, we update the field 
at each time step assuming that the source term is negligible. 
For this step we use a first-order discretization of equation (7), 


r a \ (0(t) T FAt, 

r iw. 


V0 defined 
elsewhere 


( 8 ) 


Assuming a negligible source term is reasonable over short 
time periods; however, over longer periods neglecting the 
source term rapidly degrades the assumption that the gradient 
is unity magnitude (where defined), since a source should exist 
at peaks and ridges (as in the Volcano example of Figure 2). As 
such, the source term must be taken into account somehow. 

To account for the source term, we use a process called 
reinitialization (J. A. Sethian 1985; Osher and Sethian 1988; 
Brakke 2015; Evans and Spruck 1991). Reinitialization serves 
two purposes; it forces the non-boundary region to have a 
gradient of one, and it implicitly adds material to the whole 
field to maintain the field’s shape. In particular, we use a 
process called a narrowband reinitialization (J. A. Sethian, 
n.d.). The narrowband solution assumes that the location of the 
zero-contour T is predicted accurately. The solution domain is 
then divided into two regions: a region near the zero-contour 
(the interface region) and the region farther from the zero- 
contour (the far field). Values of <|> in the interface region are 
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preserved; values in the far field are replaced by computing the 
distance of each location from the zero contour. Although this 
process does not correct the gradient inside the interface region, 
the far field values effectively introduce a boundary condition 
that drives the slope in the interface region back toward its 
correct magnitude (of one). This narrowband approach is 
numerically robust and has been used extensively in other 
applications of the level set method (J. A. Sethian, n.d.). 

A practical issue is that the approximations introduced by 
narrow banding can affect the accuracy of the prediction of the 
zero-contour T. A balance must be struck however between 
giving the zero-contour freedom of movement and constraining 
the field to a gradient of one. For this reason, reinitialization is 
not necessarily performed at every time update. In our method, 
we used a boundary width of 2 pixels on either side of the T 
contour and performed reinitialization at a rate of once per 20 
time steps. 

In summary, the key idea of the growth model is that a 
velocity field can be assigned to every point in space, allowing 
the organism surface T to be grown without knowing the 
precise location of that surface. This property is in turn useful 
because it permits the simulation of a smooth, continuous 
boundary using a relatively coarse grid. 

Control Scheme 

In this section we describe a control law that can be used to in 
conjunction with the level set methodology to define organism 
shape during regeneration. The control law defines a velocity 
field at every point in the simulation domain. As described by 
equation (4), the velocity is locally normal to contours of 
constant The velocity magnitude F is set by the control 
described in this section. Specifically, F is a summation of three 
terms, which are assumed to act independent of each other. 
These terms include patterning control P, isometric control /, 
and smoothing control K. Each term models a distinct aspect of 
biological growth. It is important to note, however, that the 
models are phenomenological in nature and are not derived 
directly from detailed data sets. Taken together, the three terms 
sum to give F. 

F = P + I + K (9) 

All terms in this equation are functions of 2D space and time. 

Patterning Control: Patterning control, P, is the key term for 
this paper as it shapes morphology by enabling local growth. 
The idea is that local cell-level actions may trigger tissue 
deterioration or growth in a small region (as at the regeneration 
site or blastema in an amputated Xenopus tail (King and 
Newmark 2012)). These local actions are responsible for 
regeneration and also for shape changes that occur during 
normal growth. Furthermore, this local activity counteracts 
disturbances, constantly adding or removing tissue to maintain 
an appropriate organism shape under varying environmental 
conditions. In principle, a failure of local patterning might 
result in uncontrolled growth (i.e. cancer). 

In our simulation, we assume that patterning growth is active 
for cells that are near the organism surface T but that are not at 
a desired location. For simplicity, we use a global reference 
map 4>ref, which is scaled to the current width of the simulated 
tail. From this map, we derive an error e at each point in the 
simulation domain. 


e(x,y,t) = (prefix,y,t) — <p(x,y,t) (10) 


The error for a given element of tissue represents the difference 
between its desired and actual distance from the organism 
surface. The error term is limited to a maximum value, e max , to 
reflect a threshold where the cells are so far from their target 
that they grow at a maximum rate. In the region near the 
organism surface, the patterning speed is set to be proportional 
to the error, modulated by a patterning control gain labeled C P . 

This proportionality is capped, however, to reflect a 
maximum cellular growth rate, which is slightly faster than 
nominal, isometric growth. The maximum speed F max is related 
to the maximum cellular growth rate CGc, max - 


F = 

1 max 


v pat r 

Ln 


SA - GC.max 0 1) 

Here the variable SA represents the surface area (which in 
2D is the length of the contour where 4> = 0). The variable V pat 
represents the volume of tissue that is active in patterning and 
is proportional to V m . The result is that patterning growth is 
nonzero only in the active region; in this region patterning 
growth is nominally proportional to error, subject to saturation 
if the growth rate becomes too large or too small. 


P = 


| Fmax 


e max 

sign(e) 

0 


| e | < e max and 0 < 0 < d 

\e\ ^ @max ^fld 0 <C 0 ^ d 
otherwise 


( 12 ) 


Isometric Control: Our simulation uses an isometric control 
term / to allow for growth that is organism wide (as compared 
to patterning growth which is local). We assume that this 
growth occurs at the same rate throughout the entire organism, 
such that the organism maintains its shape when only / is active. 
Isometric growth is, in fact, a nominal behavior for some 
organisms such as the flatworm. When food resources are 
plentiful, the flatworm grows uniformly in all directions; when 
the flatworm is starved it shrinks uniformly in all directions 
(Lobo, Beane, and Levin 2012). In Xenopus tadpoles, by 
contrast, some changes in shape occur as the organism grows 
(Love et al. 2013; Suzuki et al. 2006; Chemet, Fields, and Levin 
2015), and so nominal growth combines some aspects of 
isometric control / with patterning control P. 

The isometric control term has a uniform value of C v 
everywhere in the simulation domain when the term is active. 

/ = Fy (13) 

The isometric model represents constant cellular growth 
with time, meaning the boundary velocity must increase in time 
(as the volume to surface area ratio increases). To account for 
this, the growth speed F v is computed as 

F V = V X C CC,nom ( 14 ) 

Here V m represents the total volume of the organism (or in this 
2D simulation, the area of the tail). The rate CGc,nom represents 
the rate of cellular growth (mitosis), which is modeled to be 
uniform in space . For our simulations we assume that sufficient 
resources are available to the organism to maintain a nominal 
growth rate CGc,nom that is constant in time. 

Smoothing Control: The final term of the speed function is a 
smoothing control term K designed to eliminate sharp features 
(e.g., comers created by amputation) or to eliminate tissue 
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filaments that might (by random chance) begin to develop as 
extensions of the organism surface. This term essentially 
regularizes the organism surface to maintain smoothness. The 
specific mechanism for performing smoothing is to introduce a 
perturbation to the growth rate that is proportional to the local 
curvature K of the § contours. The constant of proportionality 
is C K . 

K(x,y,t) = C K ic(x,y,t ) (15) 

This concept for smoothing has been employed in other 
applications of level set methods, as described by (J. A. 
Sethian, n.d.; Brakke 2015; Evans and Spruck 1991; Osher and 
Sethian 1988). In our biological application, the curvature term 
not only eliminates spurious features; it is meant to prevent the 
formation of holes and discontinuities that are not represented 
in the reference map. 

By definition, curvature in a level set is the spatial 
derivative of normal vectors along a contour. The more 
rapidly the contour changes direction, the higher will be its 
curvature. Mathematically, curvature can be written 

jc = V- n (16) 

where the normal vector n is defined by equation (3) above. 

Each control regimes can be linked directly to micro scale 
cell behavior that impacts macro scale morphology. Patterning 
control begins by considering a maximum growth rate of 
individual regeneration, multiplies that growth by the number 
of patterning cells, and then distributes that growth along the 
growing boundary. Similarly, isometric control begins by 
considering the natural growth rate of cell division, multiples 
that growth by the cells in the tail, and distributes this growth 
over the body boundary by dividing by the surface area of the 
tail. Smoothing control is not as simply related to cell growth, 
but instead represents cohesiveness in the cellular matrix by 
minimizing areas of high curvature or irregularity. 

Implementation 

Each time step is represented by a single iteration of the main 
loop shown in Figure 5. In general, the simulation is allowed to 
run for 15,000 time steps, but this number must be adapted to 
grid resolution and control coefficients. 

The level set § is stored as a two-dimensional array on a 
Cartesian x-y grid. In this simulation the size of the grid was 
301 pixels (anterior-posterior) and 135 pixels (dorsal-ventral). 
Two extra cells pad the field on each edge to simplify gradient 
and curvature calculations. 


Computing ||V0|| (labeled Magnitude of Gradient of Phi 
block in Figure 5) introduces a potential source of numerical 
error, as derivative operations amplify numerical errors. 
Therefore, a weak Gaussian filter was introduced in this block 
to smooth gradient values. The filter uses a two-dimensional 
Gaussian kernel with a standard deviation of 0.25. 

At each step, the surface of the organism = 0) may lie 
between cells. No attempt is made to interpolate the actual 
surface (e.g. red contour shown in Figure 6). Rather, figures in 
this paper report all cells with § > 0 as being part of the 
organism and all cells with </> < 0 as being exterior to the 
organism. 
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Figure 6: The boundary (red curve) may lie between cell 
centers, as shown on this grid. 

Results of Simulation Verification 

A suite of four test cases were simulated in order to verify the 
basic functionality of the algorithm. The four test cases all 
considered a simulation domain modeling a stereotypical 
Xenopus tadpole tail. The final tail morphology is derived from 
Reid et. al. (Reid, Song, and Zhao 2009)and The Normal Table 
of Xenopus laevis (Faber and Nieuwkoop 1967). The four tests 
include (a) no growth, (b) patteming-based regeneration 
following amputation, (c) nominal isometric growth, and (d) 
nominal isometric growth and simultaneous regeneration 
following amputation. These test cases were selected 
respectively to examine algorithm stability, performance of 
patterning control (in isolation), performance of isometric 
control (in isolation), and performance of combined control 
terms. 

Patterning and isometric control parameters are derived from 
experimental regeneration data. All four test cases have (unless 
otherwise noted) CGc,nom = 0.00075, C K = 0.001, C p =0.02, 
d 3, C ma x 10, C Gc ,max 0.1, 1 time step =12 minutes, and are 
run for 15,000 time steps. The patterning control coefficient 
(C p ), was found using the Reid et. al. image sequence and 
analyzing its length growth rate. Isometric control coefficient 



Figure 5: Implementation of primary simulation loop. 
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Figure 7: Verification tests: (a) No Growth, (b) Regeneration Following Amputation (c) Nominal Growth (d) Nominal Growth 
and Simultaneous Regeneration Following Amputation. The right most graph shows contributions of individual control terms to 
F, integrated over the simulation domain. 


(CGc,nom) was found using the normal table of Xenopus laevis 
stage series (Faber and Nieuwkoop 1967). The pixel area of the 
organism was evaluated from experimental images, between 
stages 40 and 52. Data were plotted and the value of the growth 
rate was determined by a linear fit of the data. 

In analyzing the four test cases, it is useful to consider image 
sequences that illustrate the growth process. Figure 7 shows 
four image sequences, one for each test cases. At the end of 
each row, an additional plot shows growth contributions per 
time step for each term (P, I, and K). Each image sequence 
starts from the left; the green body is the current body shape, 
and the black outline is the final target reference. In the growth 
contributions plot (far right) the horizontal axis indicates time 
and the vertical axis indicates the growth contribution (for P, I, 
and K) integrated over the entire organism for each time step in 
units. Growth is measured in terms of grid cells, or pixels, that 
the organism fills. Hence the units of the vertical axis can be 
considered to be pixels per unit time. The area under each of 
the growth contribution curves can loosely be viewed as the 
overall contribution of material due to each growth contribution 
(P, I, and K) from the start of the simulation. 

Case A: No Growth 

The first test case looks at the stability of the patterning 
algorithm by considers a fully-grown tail where the surface of 
the organism is already at its reference location and where the 
nominal growth term is shut off (CGc,nom = 0). One would 
expect there should be no change in body shape, and little to no 
contribution from any of the active control regimes, since the 


initial organism shape matches the reference contour. Indeed, 
the image sequence in Figure 7(a) shows qualitatively that the 
tail remains stationary. Growth contributions are tiny but 
nonzero (smaller than 0.1 pixel/time step, as shown in the 
growth contributions plot at the end of the row). Though 
nonzero, the growth contributions from patterning (positive) 
and smoothing (negative) are essentially balanced. The 
implication is that the smoothing term is continually active at a 
low level and that the patterning term compensates, such that 
the two remain in static equilibrium. The fact that the system 
reaches equilibrium indicates that the algorithm is in fact stable. 

Case B: Regeneration Following Amputation 

The second test case considers a simplified model of 
regeneration following amputation, with no nominal organism 
growth. The purpose of the test case is to examine the 
performance of the patterning control term. The amputation is 
performed digitally, with the tail being “cut” at the initial time 
to leave tissue only in the 10 leftmost grid cells of the image. 
For this simulation, nominal growth is again disabled 
(CGc,nom 0). Under these conditions, we would expect the tail 
to return its nominal shape (pre-amputation), with tissue filling 
the reference map much as material might flow into a mold. 

Figure 7(b) shows that by time step 15,000, the tail has in 
fact regenerated to its nominal shape and size. Note that, for 
Case B only, the patterning growth rate was reduced relative to 
its nominal value (to C p = 0.005), in order to better visualize the 
growth process. As shown in the initial sequence, the comers 
created in the amputation persist through time step 5,000. 
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Together the smoothing term and the reference image (which 
inhibits patterning growth when the organism boundary reaches 
the reference boundary) introduce more curvature into the 
regenerating tail, as seen by time step 10,000. The smoothing 
term becomes more active as the tail becomes sharper, but 
eventually the patterning term overcomes the smoothing term 
to fill the pointed tip of the tail, as seen in time step 15,000. 

The growth contribution plot shows that total patterning 
growth ramps up slowly to a peak around time step 4,000 
before tapering toward zero approaching time step 10,000. The 
explanation is that initially the amputation boundary grows 
nearly straight (toward the right), such that the total patterning 
growth (which is proportional to the size of the amputation 
surface) is nearly constant with some slight rise due to the 
curvature appearing at the top and bottom comers of the 
amputated face. As the tail narrows, the surface area of the 
amputation face grows narrower and the amount of patterning 
growth falls quickly toward zero. Note that the scale of the 
growth contribution plot for Case B is zoomed out by 100 times 
relative to that of Case A, indicating much, much higher growth 
rates in Case B (as would be expected). 

Case C: Nominal Growth 

The third test case mimics nominal growth with no amputation 
or regeneration. In this example, the initial organism is set to 
the same shape as the initial organism in Case A, but scaled 
down in size by 50%. In concept, one would expect only 
isometric control would be necessary to grow the tail, even with 
patterning control active. 

The simulation indicates that the organism shape is 
preserved during growth, as shown in Figure 7(c). The growth 
contributions plot for this row shows that the majority of all 
growth is generated by the isometric growth term. Though the 
patterning growth term is active, it remains essentially zero 
throughout the simulation, as expected. (The exception is a 
small spike near time step 15,000, triggered by a combination 
of the reinitialization process and by erosion due to smoothing). 
The final image of the sequence shows the simulated organism 
overgrows the reference contour slightly, as can be observed 
near the tip of the tail. This occurs largely because of the 
action of the smoothing term, which rounded the tail during 
otherwise isometric growth. 

As a final note, it is worth observing that the organism 
growth rate increases slightly over time, as is evident from the 
growth contributions plot at the end of row (c) in Figure 7. The 
acceleration of growth over time matches the intent of equation 
(14), which was designed to keep the rate of cellular division 
constant, with the implication that the organisms total growth 
(pixels added per time step) should become faster as the 
organism becomes larger. 

Case D: Nominal Growth and Regeneration 

A final test case combines Cases B and C to provide a more 
realistic model for amputation, one in which patterning growth 
occurs in parallel with nominal growth. For this case, all three 
control terms are active simultaneously (with control 
parameters set to their nominal values). 

The Case D simulation confirms that the patterning and 
isometric growth terms complement each other when they are 
both active, allowing the organism to change size and shape 
simultaneously. It is perhaps surprising to observe that the 
combined growth (Case D) image sequence much more closely 


resembles nominal growth (Case C) than regeneration (Case 
B). In fact, as early as time step 5000 of the image sequence, 
Case D and Case C appear qualitatively the same, even though 
the initial conditions (at time step 1) are entirely different. The 
similarity of the image sequences can be explained by 
examining the growth contributions plot, which shows that 
patterning growth term is most active early, approximately 
through time step 1000. In fact, the shape of the patterning 
growth curve for Case D is nearly identical to what was 
observed for Case B, but with a smaller peak amplitude and 
scaled to a shorter time scale (about five times faster 
completion of patterning as compared to Case B). The 
shortened duration of patterning growth is related to the choice 
of patterning growth coefficient C p (which was reduced in Case 
B) and to the initial condition of Case D (which has half the 
width of the initial condition for Case B, such that the velocity- 
to-length ratio is increased in Case D). 

At first glance, it appears that the growth contribution plot 
for Case D suggests some interaction between the isometric and 
patterning growth terms, since the isometric growth rate in 
Case D appears to dip for the first 1000 time steps as compared 
to Case C. This difference in the initial isometric growth can 
more simply be explained as a size effect, however, rather than 
an interaction. Since the total amount of isometric growth 
scales with the amount of material in the organism, and since 
the amount of simulated material is very low post amputation 
(in Case D), it should not be surprising that the volumetric 
growth rate is initially much lower in Case D than Case C. 

Discussion 

The verification tests described in the prior section, and in 
particular the Case D test, suggest that this simulation can 
provide a relevant model for simulation of Xenopus 
regeneration. In the Case D simulation, morphology is 
regenerated while the tail grows in size, a behavior seen in 
Xenopus regeneration. Importantly, the verification tests of the 
prior section also demonstrate that the simulation is stable and 
qualitatively well behaved. The patterning control switches off 
when the organism shape approaches the reference map 
(Case A). The isometric growth and volumetric terms perform 
as expected when active individually (Case B and Case C) and 
when active simultaneously (Case D). The smoothing control 
term was active in all cases, providing small adjustments to 
regularize shape (as visible in Case B in particular), but always 
resulting in a very small contribution to the overall growth of 
the simulated organism, so small that the contribution was only 
visible when magnifying the scale of the growth contribution 
plot (as in Case A). 

The simulation does have limitations. First, the 
reinitialization process introduces slight irregularities, since 
reinitialization occurs only periodically (once every 20 time 
steps). The result is that the growth contribution plots can 
appear slightly choppy (as is visible in the saw tooth pattern for 
patterning growth in Case C and Case D). Reinitialization is 
also somewhat computationally intensive. As such, an 
alternative to reinitialization may be pursued in the future. 
Second, smoothing control effects make it difficult to generate 
sharp comers. Some modification to the smoothing control may 
be necessary in the future to allow sharp features to develop 
when desired (as in the tip of the simulated tail). Third, at small 
tail sizes, the discrete nature of the patterning control reference 
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map can introduce dithering when the reference map is 
rescaled. To mitigate this effect, we will consider alternate 
representations of the reference map in the future. Since, the 
current reference map is binary (with a one indicating a grid 
point inside the organism and a zero indicating a grid point 
outside), a floating point representation of the reference map 
would likely be helpful to reduce dithering that occurs when the 
reference map is scaled. 

The intended application of our simulation tools is to 
examine control policies and sensing modalities that might be 
used during regenerative growth. Future studies will validate 
our simulation tools through direct comparison to biological 
studies of Xenopus tail regeneration. Also, we will augment our 
current simulations with new models of control and sensing 
with the goal of explaining biological observations about the 
impact of external factors (electrical, chemical, damage, etc.) 
on regenerative growth. 

Conclusion 

This algorithm creates a simplified abstraction of cell 
regeneration morphology, using level set methods and control 
regimes, that is a base module for a future framework to 
predictively link cell-level signaling to macroscopic patterning. 
This ultimate framework may provide insight into regeneration, 
cancer, and even birth defects. This algorithm reduces complex 
cellular interactions into body boundary movement using three 
control regimes - patterning control, isometric control, and 
smoothing control. Patterning control mimics regeneration at 
wound sites and acts on the body boundary. Isometric control 
mimics bulk growth of the organism with time, and smoothing 
control regularizes growth by reducing high curvature regions. 
Looking specifically at Xenopus laevis tail regeneration, this 
algorithm shows promise in predicting cell patterning on the 
macroscopic scale. Although this paper specifically discusses 
simulation of a Xenopus tail in two dimensions, the 
methodology is general enough to be applied to arbitrary 
morphologies in both two and three dimensions. 
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Abstract 

Birdsong may be regarded as a complex adaptive system. In 
this paper we study the relationship between complexity and 
consistency of an evolving model of Cassin’s Vireo syntax, 
using a genetic algorithm to approximate a Minimal Consis¬ 
tent Deterministic Finite-state Automata (MCDFA) capable 
of accepting vocal sequences produced by birds of the study 
species. Our results imply that, despite the complex vocal be¬ 
haviour of this species, the complexity of the model can be re¬ 
duced considerably to encompass all of the positive samples 
while retaining the ability to exclude similar negative sam¬ 
ples. These results suggest the existence of important regu¬ 
larities in the song sequences of this species. 

Introduction 

The study of birdsong as a science dates back to the 1950s 
when the first sound spectrographs became available, en¬ 
abling researchers to objectively study and specify its struc¬ 
ture with an unprecedent level of detail. Since then, much 
research have focused on deciphering its purpose, learning 
mechanisms, and meaning behind them. 

There have been considerable efforts in the past towards 
understanding the structural rules that govern birdsong pro¬ 
duction (Honda and Okanoya, 1999; Berwick et al., 2011). 
However, due to the vast diversity of songbird species and 
the apparent structural differences in the songs they produce 
there is still no consensus on the limits of their complexity. 

Similarly, birdsong have proven to be an excellent plat¬ 
form for exploring a variety of topics that are relevant to arti¬ 
ficial life research, such as complexity, coevolution and sex¬ 
ual selection, among others (Taylor and Cody, 2015; Sasa- 
hara and Ikegami, 2003, 2004). 

Our research is currently focused on the acoustic monitor¬ 
ing of different species of birds in several areas of the US and 
Mexico where they are abundant (Arriaga et al., 2013). Our 
long term goal is to understand the structure and function 
of birdsong. Particularly, the research described here aims 
at analysing the trade-off between complexity and consis¬ 
tency when learning a model representation of the syntactic 
rules governing birdsong structure of Cassin’s Vireo (Vireo 
cassinii ), a songbird possessing a complex vocal behaviour. 


More precisely, we explore how the complexity of the 
model describing the syntax of CaVi song could be reduced 
while avoiding overgeneralisation. Toward that goal, we 
use a genetic algorithm to approximate a Minimal Consis¬ 
tent Deterministic Finite-state Automata (MCDFA) capable 
of accepting sequences produced by birds of the Cassin’s 
Vireo species and rejecting a collection of artificially gener¬ 
ated negative sequences. 

The problem of finding a MCDFA from a set of examples 
S is NP — Complete (Gold, 1978), further even finding an 
approximately small consistent DFA has also been proven 
to be not solvable in polynomial time (Pitt and Warmuth, 
1989). Given these constraints, approaching the problem as 
an optimisation task using evolutionary computation is a nat¬ 
ural approach as exhaustive methods are not tractable. 

Previous research on the area by (Kakishita et al., 2009) 
has already tackled the problem of finding a minimised au¬ 
tomata representation from observed song sequences, how¬ 
ever this approach assumes the target representation is a k- 
reversible language. Although this assumption is backed by 
some biological plausibility, we consider exploring methods 
not bound by these a priori assumptions an interesting line 
of research. 

Materials and Methods 
Study Species 

Cassin’s Vireo ( CaVi ) is a small migratory songbird that 
breeds throughout western North America during the April- 
July period. Individuals are territorial, establishing and de¬ 
fending their breeding grounds from other males. Singing is 
exclusive to males and is primarily used during the breed¬ 
ing season (Goguen and Curson, 2002). In our studied 
population each male possesses a repertoire comprised of 
51 ± 5 (mean db standard deviation) highly stereotyped 
phrase-types, each lasting between 0.20 and 0.66 seconds 
(Hedley, 2016). Individuals sing persistently throughout the 
day, delivering phrases at varying rates: from more than one 
phrase per second during periods of high vocalisation pro¬ 
duction, to single phrases delivered between silence inter¬ 
vals of several seconds long (Hedley, 2016). 
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Field Recordings 

All recordings were collected on private land five kilometres 
north of the town of Volcano in Amador County, California 
(10 S 706584 4262742) by Richard Hedley throughout the 
breeding season, from April 25 to June 28, 2013 and from 
May 5 to June 25, 2014. This period was deliberately cho¬ 
sen to account for possible seasonal variations in singing be¬ 
haviour and to increase the likelihood of observing the ma¬ 
jority of the temporal variability of the songs, as song output 
in the species is concentrated in the breeding season. 

Recordings were made using a Marantz PMD 661 
solid-state digital recording unit and a Sennheiser MKH20- 
P48 microphone with a Telinga parabolic reflector. Each 
recording session began when a bird was heard singing, and 
ended either when the bird stopped for a significant amount 
of time or flew away, becoming inaudible. Recordings were 
thus opportunistic in nature, without clear beginnings or 
endings in some instances. 


Recording Annotation 

Using the linguistics program Praat (Boersma and Weenink, 
2014), each recording was subsequently annotated, identify¬ 
ing and categorising distinct phrase-types through visual in¬ 
spection of their spectrograms. A unique two letter code ( aa , 
ab , ac , etc.) was assigned to each phrase-type and a spec¬ 
trogram image of the phrase-type was added to a reference 
catalogue for future phrase identification. Figure 1 shows 
spectrogram representations of three distinct phrase-types, 
as recorded from two different individuals in our study pop¬ 
ulation. 
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Figure 1: Spectrogram images of three distinct phrase-types, 
recorded from two different individuals. Panels a) and d) 
illustrate phrase-type ai; panels b) and e) illustrate phrase- 
type bj; and panels c) and f) illustrate phrase-type br. The 
first three exemplars (a-c) were recorded from the Meadow 
individual, while the last three (d-f) were recorded from the 
Sign individual. 


Phrases belonging to each phrase-type were remarkably 
stereotyped, such that each phrase was readily assigned a 
phrase-type label by visually inspecting a spectrogram of the 
signal. We determined that this method of phrase identifica¬ 
tion was objective by subsequent classification with a vari¬ 
ety of machine learning methods (Tan et al., 2015; Kantapon 
et al., 2015); these agreed almost perfectly with the human 
identification. This method has been found to be more than 
99% accurate for annotation of phrase-types for the purpose 
of syntactic analysis (Hedley, 2016). We identified a total 
of 128 distinct phrase-types among the 15 different individ¬ 
uals in our study. This group of 15 birds represents the en¬ 
tire male population of CaVi individuals living within the 
1-square kilometre valley of our study site. 

Recordings were also tagged with the designated name of 
the individual bird which produced it. This was done by the 
recordist through means of visual identification and territory 
analysis, and was later verified through the use of an en¬ 
semble of Machine Fearning Methods (Arriaga et al., 2016). 
Table 1 shows an example of some recordings as annotated 
phrase sequences. 

Table 1: Example of three recordings as sequences of 
phrases after being manually annotated. Each two-letter 
code corresponds to a distinct phrase-type identified within 
the bird species. 

Individual Annotated Recording 

agbk ah, ai, ah, aj 

ah, ai, en, aj 

fg, em, eg, cr, fq 

ai, en, ai, aj, en, en, ak 

fg, em, ck, fg, ca, fg, em 


Data Preparation 

Recordings were divided into phrase sequences by group¬ 
ing phrases sung with no more than ten second pauses be¬ 
tween them. All phrase sequences were then grouped by 
the individual which produced them. Since all our data 
consists of observed sequences, all sets consist entirely of 
positive samples (S+). Negative samples are fundamen¬ 
tal to avoid excessive generalisation of the model. For this 
purpose we artificially generated a negative sample S- 1 of 
phrase sequences by randomly sampling phrases from a col¬ 
lection of all observed phrase-types for each individual. The 
sequences produced are thus composed of uniformly dis¬ 
tributed phrase-types, and phrase-type combinations or n- 
grams. Previous research on the composition of observed 
sequences has shown non-random patterns of sequential dis¬ 
tribution of phrase-types and n-grams (Hedley, 2016). Al¬ 
though we cannot guarantee all of these simulated phrase 
sequences are impossible to produce by the birds’ syntax, 
their statistical composition make them highly unlikely. Ad- 
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ditionally, a second set of negative examples S- 2 was gen¬ 
erated by randomly sampling phrases from a collection of 
all observed phrase-types for each individual maintaining 
the same phrase-type frequency distribution as the observed 
data. Figure 2 (a) and (d) show the total number of occur¬ 
rences of distinct phrase-types and bigrams ordered by rank 
for the observed data of individual agbk compared to those 
in the simulated data S- 1 (b) and (e), and S- 2 (c) and (f). 

Minimal Consistent Deterministic Finite-State 
Automata 

A Deterministic Finite-State Automata (DFA) is a quintuple 
A = (E, Q, F, Sn) where E is an alphabet, Q is a finite 
set of states, q\ G Q is the initial state, F denotes the set of 
final states (both accepting Fa and rejecting Fr), and Sn : 
Q x (E) —>> Q is a transition function. 

The minimal DFA consistent with a given sample S is 
simply a DFA with at most opt states that accepts all S+ pos¬ 
itive examples and rejects all S- negative samples. Where 
opt is the least number of states possible for the given sam¬ 
ple. Given the hardness of the problem, an exhaustive search 
of all possible solutions is not practical for most cases. 

Genetic Algorithm 

A genetic algorithm was used to explore possible models 
for the observed data for each individual. All experiments 
were performed in the Python programming language (van 
Rossum and Drake, 2001), using the NetworkX (Hagberg 
et al., 2008) and DEAP (Fortin et al., 2012) packages. Pos¬ 
itive samples (5+) were used to build a Maximal Canonical 
Automata (MCA), a star shaped Non-Deterministic Finite- 
state Automata (NFA) with one branch for each sequence in 

S+. 

In other words, a MCA is an automata that accepts exclu¬ 
sively the sequences observed in the data. See Figure 3. 

A MCA is also structurally complete with respect to the 
positive sample, as every transition and acceptor state is used 
at least once when parsing the observed strings. We thus de¬ 
fined our search space as the subset of all automata consis¬ 
tent with S which are structurally complete. Since we are in¬ 
terested in finding a DFA, non-deterministic transitions were 
removed from the MCA. The resulting automata is equiva¬ 
lent to a Prefix Tree Acceptor (PTA). See Figure 4. 

Population 

Individuals were defined as partitions over the PTA in order 
to explore the search space. A partition II of a DFA A is 
defined as A/U= (E, Q, q\, F, Sn) 

• Q = Q /II is the set of equivalence classes defined by the 
partition II. 

• Sn is a function Q x E Q such that Vg, q' G Q, Va G 
£,<f G 5 n (q, a) if def 3q G q 3 q' G q' : q' G 
Sn ( q,s). 



(e) (f) 

Figure 2: Phrase-type and bigram occurrences sorted by 
rank out of sequences of 11838 phrases, (a) Observed 
phrase-types in agbk samples; (b) Generated phrase-types in 
negative samples from uniform distribution (5_ x ); (c) Gen¬ 
erated phrase-types in negative samples from same distri¬ 
bution as observed samples S- 2 . (d) Observed bigrams in 
agbk samples; (e) Generated bigrams in negative samples 
from uniform distribution (5_ x ). (f) Generated bigrams in 
negative samples from same distribution as observed sam- 
pies (S_ 2 ). 
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• q\ is the initial state. 



Figure 3: Maximal Canonical Automata derived from sam¬ 
ple sequences shown in Table 1. Nodes with a thicker line 
denote acceptor states. 



Figure 4: Prefix Tree Acceptor derived from sample se¬ 
quences shown in Table 1. Nodes with a thicker line denote 
acceptor states. 


• F is the set of final states. 

In short, given a partition II = (Pqj-Plj •••, -P&), in which 
Pi,i < k represents a group of states from Q, the au¬ 
tomata a /II is the result of merging together all of the 
states in each Pi. Individuals were encoded as a string 
(xo, xi ,..., x n ), where xi represents the group to which 
the corresponding node in the PTA belongs and n is equal 
to the total number of nodes in the PTA. For example, the 
partition [[2, 21], [20], [7], [0], [14, 17], [15], [19], [6], [8], [13], 
[9,10,11,18], [5,16], [3], [12], [4], [1]] is encoded by the string 
(3, 22,0,16,19,14,8, 2,9,13,13,13,17,12,4,6,14,4,13, 7,1,0} 
See Figure 5. 



Figure 5: Automata derived from applying partition 
(3, 22,0,16,19,14,8, 2,9,13,13,13,17,12,4,6,14,4,13, 7,1,0} 
to the automata shown in Figure 4. Nodes with a thicker 
line denote acceptor states. 


Thus, the initial population was generated as a collection 
of 100 random partitions over the PTA. 
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Genetic Operators 

Crossover: each generation, pairs of individuals from the 
population are chosen through tournament selection with 
a size of three and a probability of 95% and reproduced 
by swapping their chromosomes at a randomly selected 
crossover point. Mutation: individuals in the population 
change the partition group to which one of their nodes be¬ 
longs with a probability of 1%. 

Both of these genetic operators are guaranteed to produce 
valid individuals since all partitions of the PTA represent a 
valid solution for the formulated problem. 

Fitness Function 

Individuals are evaluated by a combined measure of the un¬ 
derlying DFA they represent, given by its complexity and 
inconsistency. The goal of the GA is to minimise this com¬ 
bined score. 

The complexity of a DFA was measured following the 
Minimum Description Length (MDL) principle. The MDL 
principle states that the best solution for a given set of data 
is that which minimises the encoding of the data (Rissanen, 
1978); when applied to grammatical inference, this means 
that the best hypothesis for some observed data S+ is that 
which minimises the encoding of the grammar and its pars¬ 
ing of the data (De la Higuera, 2010). Specifically, we de¬ 
fined our complexity function for a DFA A as described in 
Equations 1 and 2. 

s 

complexity = \Q\ x |£| + ^d(^) (1) 


,/ x _ f log (|{a G £ : S(q, a)is defined}|) if q G ¥r 
W “ \log(l + |{a G £ : a)is defined}|) ifqe¥ A 

( 2 ) 

In Equation 1, \Q\ denotes the number of states of A; |£| 
denotes the size of the alphabet, and d(qi ) the sum of the 
respective complexities diyp) of each state qi visited while 
parsing a sample s. Equation 2 represents the number of 
possible decisions presented in each state q as signified by 
the number of exiting edges for the state, plus one when the 
state is an acceptor, as ending parsing (and thus accepting 
the string s) represents an extra option. Note that states with 
only one option (i.e. only one exiting edge or an acceptor 
state with no exiting edges) will have a d(q) = log( 1) = 0. 

The inconsistency of a DFA is measured as the proportion 
of negative samples it accepts. 

I B acc I 

inconsistency = (3) 

\ B -\ 

In Equation 3, \B acc \ denotes the number of negative sam¬ 
ples accepted and | B _ | the total number of negative samples 
presented. 


Two fixed-length random subsamples from S+ and S_ are 
thus used each generation to evaluate the complexity and in¬ 
consistency respectively of the individuals in the population. 

Results 

Our approach proved to be successful at continuously de¬ 
creasing the complexity of the target solution while also low¬ 
ering the proportion of negative samples wrongly accepted. 
Figures 6 and 7 (a) and (b) show the box plot distributions 
of the complexity and inconsistency scores for the possible 
solutions population at each generation. For both the gay 
and meadow individuals a steady decline in inconsistency 
can be appreciated, however, the decrease in complexity is 
relatively low from generation to generation, making it nec¬ 
essary to either significantly increase the size of the initial 
population or the total number of generations the algorithm 
is allowed to run. Each of these options imply a significant 
overhead in computational cost due to the required manipu¬ 
lation of automata with a high number of nodes. 

To overcome this we instead opted for an alternative it¬ 
erative approach. Starting with an initial population of size 
p = 100, after n — 10 generations the best found solution 
was used as the starting point for generating a new popula¬ 
tion of a subsequent run, or round , of the GA, with an initial 
population of p = p + 50. After n = n + 5 generations, the 
best found solution is once again used as the starting point 
for another round of the GA. These values were determined 
both empirically and to match the available computational 
resources. 

Figures 6 and 7 (c) and (d) show the box plot distribu¬ 
tions of the complexity and inconsistency scores for each 
generation of the second round. Both scores follow a simi¬ 
lar tendency as the previous round, with complexity scores 
improving slightly at each generation, while inconsistency 
scores show a speedier improvement. Figures 6 and 7 (e) and 
(f) show the results for a third round. Although the overall 
best possible solutions belonged to populations in this round, 
several individuals in those populations suffered over gener¬ 
alisation problems, as demonstrated by the increased ratio of 
incorrectly accepted negative samples and significantly low 
complexities. 

A drawback to this approach is that it limits the search 
space on each iteration, from the lattice over the PTA to the 
lattices over the partial solutions, in doing so, better solu¬ 
tions might become completely inaccessible to the search 
procedure. However, it also offers the advantage of arriv¬ 
ing at better solutions faster, expending less computational 
resources. 

Figures 8 (a) and (b) show how the most significant de¬ 
creases in the number of states for the best found solution 
were a direct result of the start of a new round (denoted by 
the dotted lines). Likewise, Figures 8 (c) and (d) show the 
same behaviour for complexity scores. The same cannot be 
said for inconsistency scores shown in Figures 8 (e) and (f), 
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Complexity scores of solutions per generations for 1st round 


Consistency scores of solutions per generations for 1st round 


Complexity scores of solutions per generations for 1st round 


Consistency scores of solutions per generations for 1st round 



(e) (f) 

Figure 6: Box plot distribution of complexity and inconsis¬ 
tency scores across all individuals in population throughout 
each generation after one (a and b), two (c and d), and three 
(e and f) rounds ayo individual using negative sample S- t . 
The best solution found for each generation is marked with 
an x. 



(c) (d) 


Complexity scores of solutions per generations for 3rd round Consistency scores of solutions per generations for 3rd round 



(e) (f) 

Figure 7: Box plot distribution of complexity and inconsis¬ 
tency scores across all individuals in population throughout 
each generation after one (a and b), two (c and d), and three 
(e and f) rounds meadow individual using negative sample 
S- 2 . The best solution found for each generation is marked 
with an x. 
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where rounds one and two follow a more continuous decline 
until round three, where the over generalization of the po¬ 
tential solutions causes a sudden increase. 


Discussion and Future Work 

In this work, we explored the capabilities of a genetic algo¬ 
rithm coupled with the minimum description length princi¬ 
ple to model an approximate representation of the syntactic 
structure governing CaVVs songs. Moreover, we analysed 
the results in terms of complexity and inconsistency of the 
evolved models. 

Regarding the experiments comprising random negative 
examples, by sampling both phrases and their position in 
the sequence from uniform distributions, we observed a con¬ 
siderable reduction in complexity as a results of the evo¬ 
lutionary process. The inconsistency of the model also re¬ 
duced considerably, often maintaining this tendency even 
with models of reduced complexity. 

However, the syntactic models that were evolved using 
negative examples where phrases were sampled from the 
phrase distribution of the positive examples at random po¬ 
sitions, began suffering from overgeneralisation as the com¬ 
plexity decayed. 

The choice of using a DFA as a target representation was 
based on its simplicity, as there is no definite knowledge of 
the underlying complexity of the CaVi’s syntax. The inabil¬ 
ity of DFAs to improve consistency when there is some de¬ 
gree of similarity between the positive and negative exam¬ 
ples, suggests that more complex models such as probabilis¬ 
tic DFAs, pushdown automata or linear bounded automata, 
should be considered in the future. Similarly, this method 
uses exclusively a genetic algorithm for the search of pos¬ 
sible solutions, however other machine learning techniques, 
such as tabu search, could be used instead. In the future 
we will experiment with different combinations of target 
representations and search mechanisms and compare their 
strengths and weaknesses among themselves and other ex¬ 
isting methods. 

Several obstacles hinder the study of this problem. Data 
availability remains a major concern, as the process required 
for gathering and preprocessing is extremely time consum¬ 
ing. This lower availability of data complicates the usage 
of already established techniques for grammatical inference 
as the field is more oriented towards human languages, for 
which bigger data sets are easier to come by. 
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Figure 8: Number of states in best solution per generation 
for four different individual: ayo, gay, meadow and gate. 
Dotted lines along y axis represent different rounds of the 
GA. (a), (c), and (e) used negative samples S- x \ (b), (d), 
and (f) used negative samples S - 2 . 
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With 302 neurons and a fully reconstructed connectome, 
Caernohabditis elegans is an ideal candidate organism to 
study how behavior is grounded in the interaction between 
an organism’s brain, its body, and its environment. Since 
nearly its entire behavioral repertoire is expressed through 
movement, understanding the neuromechanical basis of lo¬ 
comotion is especially critical as a foundation upon which 
analyses of all other behaviors must build. In this extended 
abstract, we report on the evolution and analysis of an inte¬ 
grated neuromechanical model of forward locomotion. 

C. elegans locomotes in an undulatory fashion, generat¬ 
ing thrust by propagating dorso-ventral bends along its body. 
How the rhythmic patterns are generated and propagated is 
not yet understood. We focus here on the propagation of the 
dorsoventral body bend along the body. 

To date there have been a handful of models of forward lo¬ 
comotion (see reviews by Gjorgjieva et al. (2014) and by Co¬ 
hen and Sanders (2014)). However, recent experimental 
analysis of the structure of ventral cord circuitry (Haspel and 
O’Donovan, 2011) and the effect of local body curvature on 
nearby motor neurons (Wen et al., 2012) undermine some of 
the assumptions of these models. Furthermore, all current 
models have assumed specific answers to how the rhythmic 
movement is propagated, with little systematic exploration 
of the possibilities. 

First, we reconstructed a biomechanical model of the 
worm’s body and musculature from published descrip¬ 
tions (Boyle et al., 2012). The complete physical model 
consists of a set of 147 stiff, highly nonlinear differential- 
algebraic equations (Fig. 1 A). Second, we developed a neu¬ 
ral model of the ventral nerve cord subcircuit associated with 
forward locomotion, comprising four main classes of motor 
neurons: 12 neurons of class VB and VD, and 6 neurons 
of class DB and DD (Fig. IB), separated into 6 repeating 
neural units derived from a statistical analysis of the con¬ 
nectome (Haspel and O’Donovan, 2011). Following pre¬ 
vious work, neurons were modeled as passive, isopotential 
components, and the model includes chemical and electri¬ 
cal synapses (Izquierdo and Beer, 2013). Third, we incor¬ 
porated stretch-receptors innervating B-motoneurons from 


anterior body segments (Fig. IB), based on recent find¬ 
ings (Wen et al., 2012). The generation of the rhythmic wave 
was modeled as originating in the head. 



^ Vafitral mL-idai 1W iPomWi - 


Figure 1: Neuromechanical model. (A) Complete physical 
model (i) and one of 49 individual segments (ii) adapted 
from (Boyle et al., 2012). (B) One of 6 repeating neuro¬ 
muscular units, derived from a statistical analysis of the con¬ 
nectome (Haspel and O’Donovan, 2011). Each unit includes 
one dorsal and two ventral B- (cholinergic, blue) and D-class 
(GABAergic, magenta) motor neurons that connect to mus¬ 
cles (gray) on each side. The model includes all chemical 
synapses (blue excitatory and magenta inhibitory), gap junc¬ 
tions (green), and neuromuscular junctions (dashed). Addi¬ 
tionally, B-class neurons receive stretch-receptor input from 
anterior muscles (black) (Wen et al., 2012). 

Altogether, the model included 21 unknown electrophys- 
iological parameters. An evolutionary algorithm was used 
to determine values of the unknown parameters that opti¬ 
mized behavioral performance. Solutions were evaluated on 
how closely they matched the speed of the worm on agar. 
We ran 100 evolutionary runs and consistently found elec- 
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trophysiological configurations that produced realistic con¬ 
trol of movement when coupled to the biomechanical model 
of the body, and situated in a simulated agar environment. 

Each successful search produced a distinct set of param¬ 
eter values, leading to an ensemble of models that are con¬ 
sistent with the known biological constraints. The focus of 
our analysis was first to identify different possible classes 
of solutions through the exploration of electrophysiological 
configurations that produce realistic control of movement. 
The second part of the analysis was to understand the oper¬ 
ation of the highest-performing exemplars of each class. We 
use this insight to propose experiments on the organism that 
test the hypotheses generated by the different classes. 

In all evolved solutions, forward movement is produced 
with each body region alternating between positive and neg¬ 
ative curvature, and bands of curvature propagating from 
head to tail as shown in a kymogram (Fig 2A). Emergent 
properties of the evolved networks reproduced key experi¬ 
mental observations that they were not designed to fit, in¬ 
cluding the curvature profile of the body’s movement (i.e., 
curvatures near the head larger than curvatures near the tail) 
and the wavelength of the propagating wave. This suggests 
that the model may be operating according to principles sim¬ 
ilar to those of the biological network. 

We analyzed the properties of the entire ensemble of solu¬ 
tions using a number of different techniques, including neu¬ 
ron recordings (Fig. 2B), neural and behavioral manipula¬ 
tions, and lesion studies. An examination of the ensemble 
revealed two broad classes of solutions: some where the D- 
class motoneuron did not play a role in forward locomotion 
and some where it did. All of the evolved solutions relied 
primarily on stretch-reception. Further analysis of the oper¬ 
ation of these networks reveals the roles that the individual 
neural, synaptic and proprioceptive components of this sys¬ 
tem play in propagating and coordinating rhythmic undula- 
tory waves from head to tail during locomotion. 

C. elegans offers an unique opportunity to obtain a com¬ 
plete systems-level understanding of a locomotory circuit. 
As we better understand the operation of the ensemble of in¬ 
tegrated neuromechanical model of locomotion, the insights 
will be used to propose novel experiments on the living or¬ 
ganism that test the hypotheses generated by the different 
classes of solutions. The results of such neurobiological ex¬ 
periments can be used to constrain subsequent iterated op¬ 
timizations, ultimately improving our understanding of the 
biological system, and more generally the generation of be¬ 
havior in a coupled brain-body-environment system. 

Acknowledgment: This work was supported by NSF grants 
IIS-1216739 and IIS-1524647. 
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Figure 2: Characterizing evolved solutions. (A) Kymo¬ 
gram of time-varying curvature illustrating retrograde bend¬ 
ing waves along the simulated worm (head = 0; tail = 1) re¬ 
sponsible for forward movement. Wavelength and frequency 
in the model is similar to what has been observed in worm. 
(B) Neural activity in the Dorsal (top) and Ventral (bottom) 
B-class motoneuron for each of the six different neural units 
along the body (lighter shades of gray represent neurons in 
units closer to tail). Neural traces in the evolved circuits il¬ 
lustrate: (i) rhythmic patterns that are propagated anteriorly 
through the neural units with a phase lag, and (ii) anti-phase 
patterns of activity in dorsal and ventral units. 
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Abstract 

We present an interactive, agent-based, multi-scale 3D model 
of a colony of E. coli bacteria. We simulate chemical diffu¬ 
sion on an agar plate which is inhabited by a colony of bac¬ 
terial cells. The cells interact with a discrete grid that models 
diffusion of attractants and repellents, to which the cells react. 
For each bacterium, we simulate its chemotactic behaviour, 
making a cell either follow a gradient or tumble. Cell propul¬ 
sion is determined by the spinning direction of the motors that 
drive its flagella. 

In an agent-based model, we have implemented the molecular 
elements that comprise the two key chemotactic pathways of 
excitation and adaptation, which, in turn, regulate the motors 
and influence a cell’s movement through the agar medium. 
We show four interconnected model layers that capture the 
biological processes from the colony layer down to the level 
of interacting molecules. 


Introduction 

We have implemented a model of a colony of bacterial 
cells, which we can visualize and interact with at four dis¬ 
tinct, yet computationally interconnected levels (Fig. 1). At 
the “naked eye” level we model the gradient of a diffusing 
chemical signal, similar to what one observes in a laboratory 
by looking at an agar plate inhabited by a bacterial colony. 
Once we zoom closer into the plate, the colony of bacteria 
becomes visible, which reflects the simulated behaviours of 
cell clusters. Picking one of the cells transitions to a close- 
up view of an individual bacterium, with its flagella pro¬ 
pelling it through the medium. As the last model level, we 
can navigate into a bacterium’s cytoplasm, where we have 
implemented the molecular signalling pathways that drive 
chemotaxis. 


E. coli and Chemotaxis 

The prokaryotic cell we have modelled is known as Esch¬ 
erichia coli. Most strains of E. coli , as it is known for short, 
are harmless. We have billions of these bacteria naturally 
residing within our intestinal tracts (Zimmer, 2009). E. coli 
has been at the centre of many biological discoveries due 



Eye view: agar plate Colony view: cell cluster 



Cell view: E. coli Molecular view: cytoplasm 

Figure 1: Snapshots of the four layers of resolution within 
our multi-scale E. coli model. 


to its ease of growth and adaptability to different conditions 
and manipulation of its genome (Berg, 2004). 

Chemotaxis is a universal attribute of motile cells and 
organisms. It is the mechanism that dictates their move¬ 
ment in the presence of a stimulus (Wadhams and Armitage, 
2004). The stimulus—often chemical (hence ’’cherno”)—is 
either an attractant or a repellent (Berg, 2004). A cell like 
E. coli moves in the direction of the higher gradient towards 
an attractant source (Fig. 8A), thereby exhibiting a positive 
chemotactic response. Correspondingly, chemo-repellents 
cause the organism to turn away from the stimulus, thereby 
exhibiting a negative chemotactic response (Fig. 8B). Start¬ 
ing from a center point, a typical E. coli colony expands in 
an elliptical pattern known as chemotactic rings (Fig. 2). 
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Figure 2: A) A tryptone soft agar plate in which motile 
cells can swim through water-filled tunnels in the agar. Two 
chemotactic colonies are shown. As the cells grow, they es¬ 
tablish attractant gradients as they consume energy sources 
saturated in the agar. Printed with permission from Dr. John 
S. Parkinson Lab. B) Simulated chemotactic gradient in the 
agar layer. C) Closeup of the chemotactic ring formed by 
the simulated bacteria colony. 

Chemotaxis Receptors 

Bacteria use specific receptors to recognize chemical stim¬ 
ulants in their environment (Adler, 1966a,b, 1969, 1973; 
Adler et al., 1973). Five different receptors known as the 
methyl-accepting chemotaxis proteins (MCPs) play a key 
role in the signalling pathways (Berg, 2004): Tsr , Tar, 
Tap , Trg , and Aer. Each MCP detects a different chem¬ 
ical. MCPs are usually bundled into clusters at the poles of 
a bacterium (Adler, 1969; Sourjik, 2004). For simplicity, in 
our model we include one generic receptor which subsumes 
all MCP properties. 

Chemotaxis Pathways 

E. colV s chemotactic pathway is comprised of two distinct 
networks (Hauri and Ross, 1995): (1) the Signal Transduc¬ 
tion Cascade, which leads to an excitation reaction, and 
(2) the Methylation Response, which results in adaptation 
(Berg, 2004; Hauri and Ross, 1995; Wadhams and Armitage, 
2004). This chemotactic behaviour is a series of runs and 
tumbles the bacterium performs during its life span. 

A bacterium is propelled by flagella filaments (Fig. 3) that 
extend from its cellular membrane (Alon, 2007). Each flag¬ 
ellum is controlled at its base by a motor inside the cell 
membrane. The direction in which the motors rotate deter¬ 
mines whether the bacterium “runs” or “tumbles”. A run 
is defined by the bacterium swimming in a forward mo¬ 



A B C 


Figure 3: A) When the flagella filaments rotate counter¬ 
clockwise, they bundle together and propel the bacterium 
forward. A clockwise motor rotation makes the bacterium 
tumble, as the flagella bundles break up. B) Location of the 
motor unit in our E. coli model. C) The motor complex in¬ 
side the modelled cytoplasm. 


Protein 

Role 

CheA 

Phosphorylation protein 

CheB 

De-methylation protein 

CheY 

Signal transmission protein 

CheZ 

De-phosphorylation protein 

CheR 

Methylation protein 

CheW 

Binds with CheA to form Che AW complex 


Table 1: Proteins and their roles in the chemotaxis pathway 

tion. When the motor rotates counter-clockwise, the flag¬ 
ella bundle together to propel the bacterium forward (Berg, 
2004). During the tumble phase, the motor rotates clock¬ 
wise (Berg, 2004), which causes the flagella to break from 
the bundle, resulting in a random change of direction. In the 
absence of any stimulus, runs and tumbles alternate, which 
leads to a random walk for the bacterium. 

In the presence of a stimulus—whether it be an attractant 
or repellent—the rotation bias of the flagella is affected by 
the stimuli. Molecules of a stimulus are picked up by the re¬ 
ceptors protruding from the cell membrane. Such a stimulus 
then triggers the excitation and adaptation response. E. coli 
is constantly comparing the concentration of its current lo¬ 
cation to its previous location, thus implementing a short¬ 
term “memory” that compares present and past information 
(Segall et al., 1986). 

Table 1 summarizes the proteins involved in the chemo¬ 
taxis signalling pathways in E. coli' s cytoplasm. Each pro¬ 
tein plays a specific role in one of the two response networks. 

Excitation Response: Signal Transduction Cascade 

The excitation response directly affects the motion of the 
bacterium, where a series of signals is transferred down¬ 
stream from the receptor to a flagellum motor (Fig. 4A). 
To keep our model simple, we assume that receptors can 
only be active or inactive. Che AW, a complex formed by 
CheA and CheW, is bound to the receptor end inside the cell 
membrane. With an increase in attractant concentration, the 
receptor becomes inactive, thus increasing CheY concen- 
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Figure 4: Chemotactic pathways: schematics and translations into our agent-based E. coli model. A.l) Excitation; A.2) agent 
interactions for excitation; B.l) Adaptation; B.2) agent dynamics during adaptation response. 


tration, making the motor turn counter-clockwise, and sup¬ 
pressing the bacterium’s tumble motion (Miller et al., 2010). 

Active CheAW phosphorylates both CheB and CheY. 
CheB plays a role in adaptation (see below). CheZ’s role 
is to de-phosphorylate CheY-P. The rotation of the flagella 
is dependent on the concentrations of CheY and CheY-P. 
A counter-clockwise rotation results from a higher CheY 
concentration. Likewise, motor rotation occurs in a clock¬ 
wise fashion when CheY-P is present at a higher concen¬ 
tration than (unphosphorylated) CheY. Increasing attractant 
or decreasing the repellent concentration will bias the bac¬ 
terium into swimming in smooth arcs by suppressing CheY 
phosphorylation, and thereby increasing the concentration 
of CheY. Similarly, an increase in repellent (or decrease in 
attractant) results in an increase of CheY-P, which leads to 
more frequent tumbling. 


uously travel in a straight line regardless of the conditions. 
Methylation ensures that there is a recovery condition in the 
cell so that it may continue to query its vicinity for a more 
favourable location to travel towards. 

Methylation involves the CheA-CheW complex again 
(CheAW), as well as CheR and CheB. CheB is phosphory- 
lated by CheAW into CheB-P, which subsequently removes 
a methyl group from the receptor (Berg, 2004; Adler et al., 
1973; Patnaik, 2007). This process is called de-methylation, 
after which CheB-P returns to its unphosphorylated state 
CheB. Regardless of conditions inside or outside of the cell, 
CheR keeps methylating the receptor, which reactivates it. 
Consequently, CheB-P and CheR alternate in deactivating 
and reactivating the receptors. 

Models of Bacterial Chemotaxis 

There are two key methods for modelling biological sys¬ 
tems: mathematical or agent-based approaches. The method 
of choice depends on the system or process being modelled. 
In recent years, a new method known as hybrid modelling is 


Adaptation Response: Methylation Adaptation is the 
process of the cell returning to its normal state of behav¬ 
ior (Fig. 4B). Without methylation, the cell would contin¬ 
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being utilized, which is precisely what we are using to create 
our chemotaxis multi-scale model. This method combines 
both mathematical and agent-based models and automati¬ 
cally switches between these two techniques as required. 

Following a hybrid approach for our E. coli model, we 
use mathematical equations to track diffusion of attractants 
and repellents as well as the concentration of cells in an 
agar plate environment. In order to simulate single-cell be¬ 
haviours and bio-molecular interactions inside a cell, we use 
an agent-based model, where we track movements and col¬ 
lisions of elements in 3-dimensional cytoplasmic space. In 
the following sections, we give a brief overview of related 
modelling work for E. coli chemotaxis. 

Mathematical Models of Chemotaxis 

Many models of chemotaxis in E. coli have been developed 
over the last few decades, which differ in their comprehen¬ 
siveness and coverage of pathways (Fernando, 2005). Some 
models capture general principles, while others focus on par¬ 
ticular pathway parts. In fine-tuned models, biochemical 
parameters are accurately replicated, whereas more robust 
models replicate a wide range of parameters (Alon, 2007). 

In the 1970s, Keller and Segel (1971) proposed a mathe¬ 
matical model that was originally developed to analyze the 
movement of slime molds. Over many years, their model 
has served as the foundation for modelling chemotaxis at the 
population level. Knox et al. (1986) proposed a theoretical 
model for the adaptation process. This model has formed 
the basis for later theoretical work on adaptation response 
and is often replicated in fine-tuned models (Alon, 2007). 

Bray et al. developed a computational chemotaxis model 
(Bray et al., 1993), the Bacterial Chemotaxis Program 
(BCT), which initially modelled only the excitation re¬ 
sponse. BCT later incorporated a more biologically accurate 
representation of the receptor complexes (Bray and Bourret, 
1995), binding affinities were optimized by an evolutionary 
algorithm (Fernando, 2005), and incorporation of receptor 
clustering and sensitivity (Bray et al., 1998). Bray et al. also 
created E. solo (Bray and Lipkow, 2007), using ordinary dif¬ 
ferential equations to replicate the signaling reactions in the 
pathway. E. solo provides a graphical display of bacterial 
movement in a 2D environment. 

Barkai and Leibler (1997), presented a model for the 
adaptation response in chemotaxis, which takes a wide range 
of possible values for biochemical parameters into account. 
Their model includes several methylation sites, and repro¬ 
duces many observations on the dynamical chemotactic be¬ 
haviour of cells (Barkai and Leibler, 1997). Using a three- 
component model they showed that the adaptation process 
is robust rather than fine-tuned (Alon, 2007). 

In 1999, the BCT team and Morton-Firth et al. developed 
StochSim , the first stochastic simulation model of bacterial 
chemotaxis (Morton-Firth et al., 1999). StochSim incorpo¬ 
rates both excitation and adaptation responses. Smoldyn , an 


extension of StochSim, simulates cell-scale biochemical re¬ 
actions to capture natural stochasticity data. The program 
was developed to provide a more realistic way to simulate 
the diffusion of signaling molecules through the cytoplasm 
(Andrews and Bray, 2004). StochSim was further expanded 
by Emonet et al. (2005) to develop AgentCell , which utilizes 
agent-based modelling to represent chemotactic responses 
at the population and single cell level, simulated indepen¬ 
dently. AgentCell accurately reproduces validated results 
under both stimulated and unstimulated conditions. 

A more recent framework to specify and simulate micro¬ 
colony growth and molecular signaling for synthetic biology 
applications was developed by Jang et al. (2012). 

Agent Based Models (ABM) 

In agent-based approaches one simulates the interactions of 
elements (“agents”) with other elements and their environ¬ 
ment. These interactions often give rise to complex pat¬ 
terns, referred to as emergence (Macal and North, 2005; 
Bonabeau, 2002). Emergent properties are often not iden¬ 
tifiable by looking at the individual agents, but evolve from 
interactions between agents, as described by Ginovart et al. 
(2002) for discrete simulations of bacterial cultures . 

Most modelers consider any independent component 
(software, object, model, etc.) with some sort of defined 
(programmed) behaviour rules to be an agent. The behaviour 
can range from primitive reactive protocols to adaptive in¬ 
telligence programs (Mellouli et al., 2003). Berry (1997) 
proposed that an agent should contain both base-level rules 
for its behaviour and higher-level protocols to ’’change the 
rules” (adaptive intelligence). The base-level rules provide a 
reaction to the environment, whereas the higher-level proto¬ 
cols provide adaptation to the environment. This is the agent 
definition we have followed in our E. coli model. 

Hybrid Chemotaxis Models 

The use of hybrid models, which combine mathematical 
modelling with ABM techniques, is becoming widespread 
especially with the growth in computational power. This al¬ 
lows for more complex systems to be modelled and simu¬ 
lated such as biological systems and their cellular processes. 
Hybrid modelling has been applied for tumor growth (Patel 
et al., 2001) and forest dynamics (Landsberg and Waring, 
1997). Hybrid models have been applied for chemotaxis in 
slime molds (Dallon and Othmer, 1997) and bacteria (Fer¬ 
nando, 2005). 

A Hybrid, Multi-scale Model of Chemotaxis 

As an extension of Prokaryo, a hybrid model of prokary¬ 
otic gene regulation (Esmaeili et al., 2015), we have devel¬ 
oped a generalized model of E. coli chemotaxis. The model 
captures the key attributes and characteristics of the chemo¬ 
taxis pathways which control the locomotion of bacteria in 
a simulated agar. The mathematical model handles all of the 
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Figure 5: Software architecture and communication hierar¬ 
chy between the components in the E. coli model. 

calculations related to intra-celluar and extra-cellular signals 
as well as concentrations of molecules. The ABM handles 
interactions among cells (as individual agents on the agar) 
and among molecules in the cell’s cytoplasm. By zooming 
in and out of the 3D scenario, the scales for the model and 
its visualization automatically transition between four layers 
(Fig. 1), which includes switching from ABM calculations 
to the mathematical model (and back). The mathematical 
model is always executed in the background; this ensures 
that information is shared between the layers. 

Model Architecture 

We have used the latest version of our LINDSAY Composer 
2.0 agent simulation software (Jacob et al., 2012) to imple¬ 
ment the multi-scale E. coli model. LINDSAY Composer 
provides 3D simulation, including physics and graphics en¬ 
gines, camera navigation, interactive parameter manipula¬ 
tion, scene hierarchies, and an object-oriented programming 
environment. The architecture of our model consists of the 
following components (Fig. 5): 

Model Data: Simulation data is stored and shared among 
the layers through this module, which is only accessible 
through the Interface Controller. 

Naked Eye Layer: In this top-level view, concentrations 
of chemicals (attractants or repellents) as well as colony dis¬ 
tribution is visualized by color gradients, which are drawn 
onto the grid surface that represents the agar environment. 
Algorithm 1 is used to update the grid. 

Colony Layer: In this layer, E. coli cells and stimuli are 
represented through particle systems, which are used to gen¬ 
erate the illusion of thousands of cells and molecules in the 
environment. Movement of the particles is governed by Al¬ 
gorithm 1. 


Cell Layer: In order to reach this level, a single cell has 
been selected on the Colony Layer. An E. coli cell is rep¬ 
resented as an agent with its own distinct properties. The 
cell moves based on the other agents and the stimuli in the 
simulation at each time step. 

Molecular Layer: The lowest layer of our simulation 
captures the molecular details and interactions of the chemo- 
taxis pathways. Molecules and proteins are modelled as in¬ 
dividual agents that interact with other agents and the envi¬ 
ronment. 

Simulation Engine: The Interface Controller and model 
layers are embedded in the simulation environment. Transi¬ 
tions only occur between connected layers. 

Interface Controller: The interface controller relays in¬ 
teractions from the user (gestures, mouse clicks and move¬ 
ments, parameter manipulations) to the model layers and the 
data module. 

Model data: The model data is situated in a separate 
module and shared among the layers by communicating 
through the Interface Controller. 


Algorithm 1 Update of Petri Dish and Colony Layer 
CREATE: 

Initialize 2D grid cells 

attractant := 0 ; repellent := 0 ; ecoli := 0 

ITERATE: 

for all gcell G Grid do 

for all e gce u G { attractant , repellent , ecoli} do 
if | e gC eii | 7^ 0 then > Cell contains element(s) 
Apply Gaussian algorithm > Diffusion 
Update gcell 

end if 
end for 

if layer == PetriDish then > Petri dish layer 
isColourCell := true 
Apply colour visuals to gcell 
else > Colony layer 

isParticlesCell := true 
Generate particles for a cell 
end if 
end for 


Molecular Agents 

All protein models in our simulation have been extracted 
from online protein databases. We have listed the 3D shapes 
and the PDB IDs of our bio-molecular agents in Figure 7. 
Recall the role of each agent from Table 1. In this paper, we 
only have space to illustrate two agents and how we have im¬ 
plemented their behaviour rules. More information is avail¬ 
able on our project website (LindsayVirtualHuman.org). 

CheY protein is phosphorylated by the CheAW complex 
(Algorithm 2). CheY-P will dock onto the motor unit. Upon 
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Figure 6: A glimpse of the full-scale simulation inside the E. coli cell’s cytoplasm. 
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Figure 7: 3D structures of the protein agents in our chemo- 
taxis model as used in the cytoplasm layer (Fig. 6). The 
meshes were imported using the associated PDB Ids from 
the Protein Databank (www.wwpdb.org). 


collision with CheZ, CheY-P gets de-phosphorilized and fol¬ 
lows an attractive force towards the CheAW complex, which 
will start the interaction loop again. 

CheZ is set to be attracted to the motor complex (Algo¬ 
rithm 3). CheZ performs a random walk in the vicinity of the 
motor complex, checking for collisions with phosphorylated 
CheY. Upon collision with a CheY-P agent, CheZ removes 
the phosphate group from CheY, and subsequently enters a 
short period of inactivity. 

An impression of what the simulation looks like—with all 
molecular agents, including water and lactose, interacting 
inside the cytoplasmic space—is depicted in Figure 6. 


Algorithm 2 CheY 
1: CREATE: 

2: set state to RandomWalk 
3: ITERATE: 

4: if CheY is Active then 
5: if CheY is not phosphorylated then 

6: if collided with CheAW complex then 

7: set state to phosphorylated 

8: set agent conformation to CheY-P mesh 

9: set attraction to Motor 

10: else 

11: set state to RandomWal k 

12: end if 

13: else > CheY is phosphorylated 

14: if collided with Motor then 

15: set state to BoundToMotor 

16: clear movement velocities 

17: if collided with CheZ then 

18: set state to not phosphorylated 

19: set agent conformation to CheY mesh 

20: set attraction towards CheAW Complex 

21: end if 

22: end if 

23: end if 

24: else 

25: set state to Random Walk 

26: end if 


Simulation Results 

We have been using results from our simulations through¬ 
out the illustrations in this paper. Starting with our Eye and 
Colony view, we can see that in an environment such as 
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Algorithm 3 CheZ 


1 

CREATE: 


2 

set state to RandomWalk , Active 


3 

set attraction towards Motor 


4 

set timer to 100 

> Start timer 

5 

ITERATE: 


6 

if CheZ is Active then 


7 

if CheZ collided with CheY-P then 


8 

remove P from CheY-P 


9 

set CheZ to -i Active 


10 

end if 


11 

else 


12 

timer := timer -1 


13 

if timer == 0 then 


14 

set CheZ to Active 


15 

timer := 100 

> Restart timer 

16 

end if 


17 

end if 



an agar plate, with no external stimulus, chemotactic rings 
are formed (Fig. 2B,C), similar to a wet-lab experiment of 
E. coli growing in a tryptone soft agar plate. 

Placing an attractant stimulus in the environment, our 
simulated E. coli cells grow and move towards the origin 
of attraction (Fig. 8A). Similarly, with a repellent stimulus 
the colony moves away from the repelling source (Fig. 8B). 

In single cell view, we see an E. coli bacterium following 
a gradient or performing a random walk in the absence of a 
stimulus (Fig. 9). 

On the molecular interaction level, we have replicated the 
different interaction phases of the chemotaxis pathway. Fig¬ 
ure 4 illustrates this with a side-by-side comparison of the 
pathway diagrams and their replication in our agent-based 
model, where the agent behaviours are driven by short code 
scripts, such as the examples of Algorithms 2 and 3. 

Conclusion 

We have introduced a multi-scale, hybrid model that repli¬ 
cates and illustrates chemotaxis of E. coli bacteria. We have 
implemented abstractions of chemotaxis on four levels of 
detail: from the naked eye and colony level down to sin¬ 
gle cells and their cytoplasm. Our model system is interac¬ 
tive, provides 3D visuals, and can serve as a tool to learn 
about and explore behaviours in biological systems arising 
from the interactions of many constituents across a range of 
scales of resolution. More information about this system and 
related simulations can be found on the LINDSAY Virtual 
Human web site. 



Figure 8: Modeled chemotaxis: Movement toward an at¬ 
tractant (A) and away from a repellent stimulus (B). The left 
column shows colony movement (blue), repellent (red) and 
attractant (green); bacterial colony closeups on the right. 
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Figure 9: E. coli as a single agent in agar environment. 
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Abstract 

Flies that walk in a covered planar arena on straight paths 
avoid colliding with each other, but which of the two flies 
stops is not random. High-throughput video observations, 
coupled with dedicated experiments with controlled robot 
flies have revealed that flies utilize the type of optic flow on 
their retina as a determinant of who should stop, a strategy 
also used by ship captains to determine which of two ships 
on a collision course should throw engines in reverse. We use 
digital evolution to test whether this strategy evolves when 
collision avoidance is the sole selective pressure. We find 
that the strategy does indeed evolve in a narrow range of 
cost/benefit ratios, for experiments in which the “regressive 
motion” cue is error free. We speculate that these stringent 
conditions may not be sufficient to evolve the strategy in real 
flies, pointing perhaps to auxiliary costs and benefits not mod¬ 
eled in our study. 

Introduction 

How animals make decisions has always been an interesting, 
yet controversial, question to scientists (McFarland, 1977) 
and philosophers alike. Animals obtain various types of 
sensory information from the environment and then process 
these information streams so as to take actions that benefit 
them in survival and reproduction. The visual system plays 
an important role in providing animals information about 
their environment, for example when foraging for food, de¬ 
tecting predators or prey, and when searching for potential 
mates. One of the primary components of visual informa¬ 
tion is motion detection. Motion is a fundamental percep¬ 
tual dimension of visual systems (Borst and Egelhaaf, 1989) 
and is a key component in decision making in most ani¬ 
mals. Here, we study a very particular type of motion de¬ 
tection and concomitant behavior (collision avoidance) in 
Drosophila melanogaster (the common fruit fly), and at¬ 
tempt to unravel the selective (i.e., evolutionary) pressures 
that might have given rise to this behavior. 

D. melanogaster shows a striking difference in behavior 
when exposed to two different types of optical flow. Bran¬ 
son et al. (2009) recorded the interaction of groups of fruit 
flies in a planar covered arena (so that they could only walk, 
not fly) and used computer vision algorithms to analyze the 


walking trajectories in order to study fly behavior. Their 
analysis revealed that female fruit flies stop walking when 
they perceive another fly’s motion from back-to-front in 
their visual field (an optical flow referred to as “regressive 
motion”) whereas they keep walking when perceiving con- 
specifics moving from front-to-back in their visual field (re¬ 
ferred to as “progressive motion,” see Figure 1). Zabala et al. 
(2012) further investigated this behavior and tested the “re¬ 
gressive motion saliency” hypothesis, suggesting that flies 
stop walking when perceiving regressive motion. They used 
a programmable fly-sized robot interacting with a real fly to 
exclude other sensory cues such as image expansion (“loom¬ 
ing,” see Schiff et al. 1962) and pheromones. Their results 
provide rigorous support for the regressive motion saliency 
hypothesis. 

Subsequently, Chalupka et al. (2015) coined the term 
“generalized regressive motion” for optic flows in which im¬ 
ages move clockwise on the left eye and conversely, coun¬ 
terclockwise on the right eye (see Figure 1). They presented 
a geometric analysis for two flies moving on straight, in¬ 
tersecting trajectories with constant velocities and showed 
that the fly that reaches the intersection first always per¬ 
ceives progressive motion on its retina, whereas the one 
that reaches the intersection later perceives regressive mo¬ 
tion at all times before the other fly reaches the intersection. 
They went on to suggest that this behavior is a strategy to 
avoid collisions during locomotion similar to the rules that 
ship captains use when moving on intersecting paths (see, 
e.g., Maloney 1989). 

As intriguing as this hypothesis may seem, it is not clear 
a priori which selective pressures or environmental circum¬ 
stances could give rise to this behavior. For example, it is 
unclear whether collision avoidance provides a significant 
enough fitness benefit. As a consequence, it is possible that 
the behavior has its origin in a completely different cogni¬ 
tive constraint that is fundamentally unrelated to collision 
avoidance, or to the rules that ship captains use to navigate 
the seas. While such questions are difficult to answer using 
traditional behavioral biology methods, Artificial Life offers 
unique opportunities to test these hypotheses directly. 
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Figure 1: An illustration of regressive (back-to-front, left) 
and progressive (front-to-back, right) optic flows in a fly’s 
retina. 

In this study, we tested whether collision avoidance can be 
a sufficient selective pressure for the described behavior to 
evolve. We also investigated the environmental conditions 
under which this behavior could have evolved, in terms of 
the varying costs and benefits involved. By using an agent- 
based computational model (described in more detail be¬ 
low), we studied how the interplay (and trade-offs) between 
the necessity to move and the avoidance of collisions can re¬ 
sult in the evolution of regressive motion saliency in digital 
flies. 

Digital evolution is currently the only technique that can 
study hypotheses concerning the selective pressures neces¬ 
sary (or even sufficient) for the emergence of animal behav¬ 
iors, as experimental evolution with animal lines of thou¬ 
sands of generations is impractical. In digital evolution, we 
can study the interplay between multiple factors such as se¬ 
lective pressures, environmental conditions, population size 
and structure, etc. For example, Olson et al. (2013) used dig¬ 
ital evolution to show that predator confusion is a sufficient 
condition to evolve swarming behavior, but they also found 
that collective vigilance can give rise to gregarious foraging 
behavior in group-living organisms (Olson et al., 2015). In 
principle, any one hypothesis favoring the emergence of be¬ 
havior can be tested in isolation, or in conjunction (Olson 
et al., 2015). 

Methods 

Markov Networks 

We use an agent-based model to simulate the interaction 
of walking flies with moving objects (here, potentially con- 
specifics) in a two-dimensional virtual world. Agents have 
sensors to perceive their surrounding world (details below) 
and have actuators that enable them to move in the envi¬ 
ronment. Agent brains in our experiment have altogether 
twelve sensors, three internal processing nodes, and one out¬ 
put node (the actuator). The brain controlling the agent is 
a “Markov network brain” (MNB), which is a probabilis¬ 
tic controller that makes decisions based on sensory inputs 
and internal nodes (Edlund et al., 2011). Each node in the 



Figure 2: Probabilistic logic gates in Markov network brains 
with three inputs and two outputs. One of the outputs writes 
into one of the inputs of this gate, so its output is “hidden.” 
Because after firing all Markov neurons automatically return 
to the quiescent state, values can only be kept in memory 
by actively maintaining them. Probability table shows the 
probability of each output given input values. 

network (i.e., sensors, internal nodes, and actuators) can 
be thought of as a digital (binary) neuron that either fires 
(value=l), or is quiescent (value=0). Nodes of the network 
are connected via Hidden Markov Gates (HMGs) that func¬ 
tion as probabilistic logic gates. Each HMG is specified by 
its inputs, outputs, and a state transition table that specifies 
the probability of each output state based on input states 
(Figure 2). For example, in the transition table of Figure 2 
(a three-input, two-output gate), the probability £>73 controls 
the likelihood that the output state is 3 (the decimal equiv¬ 
alent of the binary pattern 11 , that is, both output neurons 
fire) given that the input happened to be state 7 (the deci¬ 
mal translation of 111 , i.e., all inputs are active). MNBs 
can consist of any number of HMGs with any possible con¬ 
nection arrangement, given certain constrains (see for exam¬ 
ple Edlund et al. 2011). 

The number of gates, their connections, and how they 
work is subject to evolution and changes across individu¬ 
als and through generations. For this purpose, the agent’s 
brains are encoded in a genome, which is an ordered se¬ 
quence of integers, each in the range [0,255], i.e., one byte. 
Each integer (or byte) is a locus in the genome and specific 
sequences of loci construct genes, where each gene codes for 
one HMG. The “start codon” for a gene (i.e., the sequence 
that determines the beginning of the gene) in our encoding 
is the pair (42,213) (these numbers are arbitrary). Each gene 
encodes exactly one HMG, for example as shown in Fig¬ 
ure 3. The gene specifies the number of inputs/outputs in 
each HMG, which nodes it reads from and writes to (the con¬ 
nectivity) and the probability table that determines the gates’ 
function. As shown in Figure 3, the first two bytes are the 
start codon, followed by one byte that specifies the number 
of inputs and one byte for the number of outputs. The bytes 
are modulated so as to encode the number of inputs and out- 
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Figure 3: An illustration of a portion of genome containing two genes that encode two HMGs. The first two loci represent start 
codon (red blocks), followed by two loci that determine the number of inputs and outputs respectively (green blocks). The next 
four loci specify which nodes are inputs of this gate (blue blocks) and the following four specify output nodes (yellow blocks). 
The remaining loci encode the probabilities of HMG’s logic table (cyan blocks). 


puts unambiguously. For example, the bytes encoding the 
number of inputs is an integer in [0,255] whereas a HMG 
can take a maximum of four inputs, thus we use a mapping 
function that generates a number G [1,4] from the value of 
this byte. The next four bytes specify the inputs of the HMG, 
followed by another four bytes specifying where it writes to. 
The remaining bytes of the gene are mapped to construct the 
probabilistic logic gate table. MNBs have been used exten¬ 
sively in the last five years to study the evolution of naviga¬ 
tion (Edlund et al., 2011; Joshi et al., 2013), the evolution 
of active categorical perception (Marstaller et al., 2013; Al- 
bantakis et al., 2014), the evolution of swarming behavior as 
noted earlier, as well as how visual cortices (Chapman et al., 
2013) and hierarchical groups (Hintze and Miromeni, 2014) 
form. In this work, we force the gates to be deterministic 
rather than probabilistic (all values in the logic table are 0 or 
1), which turns our HMGs into classical logic gates. 

Experimental Configurations 

We construct an initial population of 100 agents (digital 
flies), each with a genome initialized with 5,000 random in¬ 
tegers containing four start codons (to jump-start evolution). 
Agents (and by proxy the genomes that determine them) are 
scored based on how they perform in their living environ¬ 
ment. The population of genomes is updated via a standard 
Genetic Algorithm (GA) for 50,000 generations, where the 
next generation of genomes is constructed via roulette wheel 
selection combined with mutations (detailed GA specifica¬ 
tions are listed in Table 1). To control for the effects of re¬ 
production and similar effects, there is no crossover or im¬ 
migration in our GA implementation. 

Each digital fly is put in a virtual world for 25,000 time 
steps, during which time its fitness score is evaluated. Dur¬ 
ing each time step in the simulation, the agent perceives 
its surrounding environment, processes the information with 
its MNB, and makes movement decisions according to the 
MNB outputs. The sensory system of a digital fly is de¬ 
signed such that it can see surrounding objects within a lim¬ 
ited distance of 250 units, in a 280° pixelated retina shown 


in Figure 4. The state of each sensor node is 0 (inactive) 
when it does not sense anything within the radius, and turns 
to 1 (active) if an object is projected at that position in the 
retina. Agents in this experiment have one actuator node that 
enables them to move ahead or stop, for active (firing) and 
non-active (quiescent) states respectively. 


GA Parameters 

Environment Parameters 

Population size 

100 

Vision range 

250 

Generations 

50,000 

Field of vision 

280° 

Point mutation rate 

0.5% 

Collision range 

60 

Gene deletion rate 

2% 

Agent velocity 

15 

Gene duplication rate 

5% 

Event time steps 

250 

Initial genome length 

5,000 

No. of events 

100 

Initial start codons 

4 

Moving reward 

0.004 

Crossover 

None 

Collis. penalty 

1,2,3,5,10 

Immigration 

None 

Replicates 

20 


Table 1: Configurations for GA and Environmental setup 

In our experiment, the digital flies exist in an environ¬ 
ment where they should move to gain fitness, representing 
the fact that organisms should forage for resources, mates, 
and avoiding predators. Thus, the fitness function is set so 
that agents are rewarded for moving ahead at each update 
of the world, and are penalized for colliding with objects. 
The amount of fitness they gain for moving (the benefit) is 
characteristic of the environment, and we change it in dif¬ 
ferent treatments. The penalty for collisions represents the 
importance of collision avoidance for their survival and re¬ 
production, and we vary this cost also. Each digital fly sees 
100 moving objects (one at a time) during its lifetime, and 
we say that it experiences 100 “events.” The penalty-reward 
ratio (PR) determines the amount of penalty of collision di¬ 
vided by the reward for moving during the entirety of an 
event. So for example, PR=1 means the agent loses all the 
rewards it gained by walking during the whole event if it 
collides with the object in that event: 

fitness = (reward — PR x collision) , (1) 

events 


556 





































r 


(a! (b) (c) 

Figure 4: The digital fly and its visual field in the model. 
Flies have a 12 pixel retina that is able to sense surrounding 
objects in 280° within a limited distance (250 units). The red 
circle is an external object that can be detected by the agent 
within its vision field. Activated sensors are shown in red, 
while inactive sensors are blue. In (a) the object activates 
two sensors, in (b) the object is detected in one sensor, and 
in (c) the object is outside the range. 


where reward E [0,1] reflects how many time steps the 
agent moved during the event. Our experiments are con¬ 
structed such that all objects that produce regressive motion 
in the digital retina will collide with the fly if it keeps mov¬ 
ing. The reason for biasing our experiments in this manner 
is explained in the following section. 

Collision Probability in Events with Regressive 
Optic Flow 

As mentioned earlier, Chalupka et al. (2015) showed that for 
two flies moving on straight, intersecting trajectories with 
constant velocities, the fly that reaches the intersection first 
always perceives progressive motion on its retina while the 
counterpart that reaches the intersection later perceives re¬ 
gressive motion at all times before the first fly reaches the 
intersection. However, this does not imply that all objects 
that produce a regressive motion on a fly’s retina will nec¬ 
essarily collide with it. In this section we present a mathe¬ 
matical analysis to discover how often objects that produce 
regressive motion in the fly’s retina will eventually collide 
with the fly if it continues walking. 

Suppose a fly moves on a straight line with constant ve¬ 
locity Vdy and an object is also moving on a straight line 
with constant velocity 14b j (Figure 5-a). The fly is able to 
perceive objects within distance R v i S , its vision range (Fig¬ 
ure 5-a). The object is assumed to be a point in the plane and 
the distance between this point and the center of the visual 
field of the fly is defined to be the distance between them. 
We define “the onset of the event” as the first time the object 
is detected by the fly. At the onset of the event, the object 
is at the distance R v i s of the fly at relative azimuthal angle 
a E [0, |] (Figure 5-a). We assume that the object can be 
at any relative position R V1S = (R v i S , a) 1 with equal prob¬ 
lem and below, we represent vectors either in boldface or by 
the parameters that determine them within a planar polar coordinate 
system. Thus the vector R is represented by (|R|, 0), where R x = 
R cos 0 and R y — R sin 0. 



Figure 5: An illustration of a moving fly at the onset of the 
event. 


abilities (the probability distribution of a is uniform around 
the fly). The velocity of the object can be represented as 
Vobj = (Kbj, 0) where 0 E [E y , § ] (note that 14bj is con¬ 
stant). We also assume that the velocity of the object can 
point in all directions with equal probabilities (the probabil¬ 
ity distribution of 0 is uniform). The relative velocity of the 
object with respect to the fly is 14 e i = 14b j — 14y (Fig¬ 
ure 5). Since both 14b j and 14 y are constant, 14ei is also a 
constant vector. 


Proposition 1. A moving object produces regressive mo¬ 
tion on a fly’s retina if: 

0 > —a + arcsin(^^ cos a) . ( 2 ) 

Fobj 

Proof. In order for the object to produce regressive motion 
on the retina, the relative velocity should be pointed above 
the center point O. The relative velocity direction 7 can be 
found awith 14 e i '= (V 4 i, 7), as 


7 = arctan 


bel y 

bel x 


= arctan 


/ Kbj_sin0-74y \ 
V Vbj cos 9 ) 


(3) 


The angle 7 should be greater than the central angle (Fig¬ 
ure 5-b), that is, 7 > —a. Replacing 7 and simplifying, we 
obtain: 


0 > — a + arcsin(z/cos<a), v = . (4) 

Fobj 

For smaller values of 6 , the object produces progressive op¬ 
tic flow. We thus define # m in = — + arcsin(z/ cos(<a)) as 

the minimum angle 0 that produces regressive motion on the 
retina. 


Definition 1. The object remains “observable” to the fly 
after the onset of the event if its relative velocity is directed 
toward the inside of the fly’s vision field (to the left of the 
tangent line Si in Figure 5-b). 


Proposition 2. The object remains observable to the fly if: 

( 5 ) 


% 


0 < arccos(——— sin a) — a . 
Vobj 


557 







Proof. According to the definition the sufficient condition 
for observability is that 7 should be less than the tangent line 
Si angle: 7 < — a + | . Replacing 7 and simplifying we 
obtain 


0 < arccos(—z/shm) — a . ( 6 ) 

For greater values of 0 , the object will be out of vision range 
of the fly. Thus the maximum value that 0 can take on is: 

#max = arccos (—v sin a) — a . (7) 

In order for the object to produce regressive motion on fly’s 
retina and also remain observable to the fly, relative velocity 
should be within the arc ip (Figure 5-b). 

Definition 2. The object collides with the fly if its distance 
with the fly is less than “collision range” R co \\ (Figure 5-b). 


Proposition 3. An object that creates regressive optic flow 
on the fly’s retina and remains observable will collide with 


it if: 


6 < p + arcsin(z/ cos (p ), 


= arcsin( ^ co11 ) — a . ( 8 ) 

-*£vis 


Proof. The relative velocity of such object is within arc ip. 
This object will collide with the fly if its relative velocity 
is within the arc spanned by the angle /3, i.e. lower than 
tangent line to collision circle (Figure 5-b). This condition 
holds true if: 


7 < /3 — <a, /3 = arcsin( co ) . (9) 

-O^vis 

Let p = and (p = /3 — a. Replacing 7 and rearranging 

xTvis ' ^ ^ w 

gives: 


0 < (p + arcsin(z/ cos (p) . (10) 

For greater values of 6 , the object produces regressive mo¬ 
tion on the fly’s retina but does not collide with it. So the 
threshold collision angle is given by: 

0 CO \ = (p + arcsin(z/ cos (p) . (11) 

As mentioned, we assume that the probability distribution of 
the direction of the object velocity, 6 is uniform. 

Definition 3. For an object at initial position a , the proba¬ 
bility n co ii is the range of velocity directions 6 such that the 
object collides with the fly divided by the range of directions 
with which it creates regressive optic flow on fly’s retina (see 
Figure 5-b): 

n / \ ^col — $min /i 

coil (a, z/, p) = --—- . (12) 

t'max t/ m m 

Integrating this function over the range of possible initial 
relative positions, the probability that an event results in a 



Figure 6 : Probability of collision n co ii(z 7 p) with an object 
that creates regressive motion on the retina as a function of 
the ratio of vision radius to collision radius p, for different 
fly-object velocity ratios v. 


collision given that the object produces regressive motion 
on an fly’s retina can be found as: 

n co ii(^p) = J n co n(a,v,p)da, (13) 

«min 

where a m in is either 0 or the minimum value of a for which 
there exists a 0 with which the object can produce a regres¬ 
sive motion on fly’s retina, and a m i n is either 90 or maxi¬ 
mum value of a for which there exists a 6 with which the 
object remains observable to the fly. 

We calculated the integral (13) numerically and show the 
results in Figure 6 for different values of fly-object veloc¬ 
ity ratios v and different collision range-vision range ratios 
p. As can be seen from Figure 6 , for i7 vis =60 mm (Za- 
bala et al., 2012 ) and R co \\=15 mm (our assumption), the 
collision probability is around 0.2-0.3. This implies that if 
encounters are created randomly, regressive motion on the 
retina is not predictive of collision, and as a consequence it 
is unreasonable to expect that digital evolution will produce 
collision avoidance in response, as only 1 in 5 to 1 in 3 re¬ 
gressive motions actually lead to collisions. This was borne 
out in experiments, and we thus decided to bias the events in 
such a manner that all events that leave a regressive motion 
signature in the retina will lead to collision. Note that this 
is not necessarily an unrealistic assumption, as we have not 
analyzed a distribution of realistic “events” (such as is avail¬ 
able in the data set of Branson et al. 2009). It could very 
well be that the way real flies approach each other differs 
from the uniform distributions that went into the mathemat¬ 
ical analysis presented here. 

Results 

We conducted experiments with five different fitness func¬ 
tions representing different environments. Environments 
differ in the amount of fitness individuals gain when moving 
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and in the penalty incurred by a collision. Evolved agents 
use various strategies to avoid collisions and maximize the 
travelled distance, but one of the most successful strategies 
they use is indeed to categorize visual cues into regressive 
and progressive optic flows. We find that agents categorize 
these visual cues only in some regions of the retina: the re¬ 
gions in which collisions take place more frequently. They 
then use this information to cast a movement decision: they 
keep moving when seeing an object creating progressive op¬ 
tic flow on their retina, and stop when the object creates re¬ 
gressive optic flow on their retina. However, they do not 
stop for the entire duration of the event, i.e., the whole time 
they perceive regressive optic flow. Rather they stop during 
only a portion of the event, which helps the agent to avoid 
a collision with the object while maximizing their walking 
duration and hence gaining higher fitness. 

The strategy of using regressive motion as a cue for colli¬ 
sion (Chalupka et al., 2015), similar to the observed behavior 
in fruit flies (Zabala et al., 2012) evolves in our experimental 
setup under some environmental circumstances (discussed 
below). We refer to this strategy as regressive-collision-cue 
(RCC) and we define it in our experimental setup as follows: 

1) The moving object produces regressive motion on the 
agent’s retina during an event and the agent stops at least 
for some time during that event, or 

2) The moving object produces progressive motion on the 
agent’s retina during an event and the agent does not stop 
during that event. The number of events (out of 100) in 
which the agent uses this strategy is termed the “RCC value.” 

We now discuss the results of an experiment in which 
the RCC strategy has evolved. We take the most successful 
agent at the end of that experiment and analyze its behavior. 
This agent evolved in an environment with penalty-reward 
ratio of 2, meaning the penalty of each collision equals twice 
the maximum reward the agent can gain in 2 events. Figure 7 
shows whether the agent stopped during an event, stop prob¬ 
ability (blue triangles), as a function of the angular velocity 
of the image on the agent’s retina for 100 events. In that 
figure, the angular velocity of the image on agent’s retina 
is negative for regressive optic flow and positive for pro¬ 
gressive events. Simulation units are converted to plotted 
values (in deg/s and mm/s) by equalizing dimensionless val¬ 
ues z/, and p in simulation and actual values: i7 vis =60 mm 
(Zabala et al., 2012), Vfl y =20 mm/s (Zabala et al., 2012), 
i?coii=15 mm (our assumption). We can see from the fig¬ 
ure that out of all 100 events, the agent did not stop during 
one event with regressive motion while for two progressive 
events, it stopped. In the remaining events the agent accu¬ 
rately uses the RCC strategy (resulting in an RCC value=97). 
The average velocity of the agent during each event is also 
shown (solid orange circles), which reflects the number of 
time steps the agent moves during that event (and thus in¬ 
directly how often it stops). For progressive motions, the 
stop probability is zero (the agent continues to move dur- 



Angular Velocity (deg/s) 


Figure 7: The stop probability of the evolved agent vs. the 
angular velocity of the image on its retina for 100 events. 
Positive values of angular velocity show progressive motion 
events and negative angular velocities stand for regressive 
motion events. The average velocity of the agent is also 
shown during each event. 


ing the event) and thus the velocity of the agent is maximal 
during that event. For regressive optic flow (negative angu¬ 
lar velocities), the average velocity during each event is less 
than maximum and for extreme angular velocities, as it only 
needs to stop for shorter durations to avoid collisions. 

In order to quantitatively analyze how using regressive 
motion as a collision cue benefits agents to gain more fitness, 
we traced this particular agent’s evolutionary line of descent 
(FOD) by following its lineage backwards for 50,000 gen¬ 
erations mutation by mutation until we reached the random 
agent that we used to seed the initial population (see Fenski 
et al. 2003 for more details on how to construct evolution¬ 
ary lines of descent for digital organisms). Figure 8 shows 
the fitness and the RCC value vs. generation for this agent’s 
FOD. It is evident from these results that evolving this strat¬ 
egy benefits agents in gaining fitness compared to the rest 
of the population in this environment as high peaks of fit¬ 
ness occur at high RCC values and conversely, the fitness 
drops as the RCC value decreases. Nevertheless, this strat¬ 
egy does not evolve all the time. Figure 9 shows the fit¬ 
ness and RCC for all 20 replicates in the environment with 
penalty-reward ratio of 2. We can see that the mean fitness 
of all 20 replicates is around 20% less than the fitness of the 
agent that evolved the RCC strategy. The mean RCC value 
for all 20 replicates is also ^20% less than that of an agent 
that evolved the RCC strategy. 

The difficulty to evolve the RCC strategy is not limited 
to the number of runs in which this behavior evolved out 
of all replicates in some environment (we also tried running 
the experiment for longer evolutionary times but the results 
do not change significantly). Environmental conditions also 
play a key role in the evolution of this behavior. Figure 10 
shows the RCC value distribution for 20 replicates in five 
different environments. In order to calculate the RCC value 
in each replicate, we took the average of the RCC value in 


559 




Discussion 



Figure 8: Fitness and regressive-collision-cue (RCC) value 
on the line of descent for an agent that evolved RCC as a 
strategy to avoid collisions. Only the first 20,000 genera¬ 
tions are shown, for every 500 generations. 



Figure 9: Mean values of fitness and regressive-collision- 
cue (RCC) over all 20 replicates vs. evolutionary time in 
the line of descent in the environment with penalty-reward 
ratio of 2. Standard error lines are shown with shaded areas 
around mean values. Only the first 20,000 generations are 
shown, for every 500 generations. 


the last 1,000 generations on the line of descent to compen¬ 
sate for fluctuations. We observe that the RCC strategy only 
evolves in a narrow range of penalty-reward ratio, namely 
for PR=2 and PR=3. According to Figure 10, higher val¬ 
ues of penalty on the one hand discourage the agents from 
walking in the environment (they simply choose to remain 
stationary), and therefore prevent them from exploring the 
fitness landscape. Lower values for the penalty, on the other 
hand, result in indifference to collisions and thus, the opti¬ 
mal strategy (probably the local optimum) in these environ¬ 
ments is to keep walking and ignore all collisions. For lower 
values of the penalty, the RCC value is « 55%, which means 
they evolve to stop in obvious cases that end up in collision 
(if they keep moving, the RCC value should be 50). 


We used an agent-based model of flies equipped with MNBs 
that evolve via a GA to study the selective pressures and en¬ 
vironmental conditions that can lead to the evolution of col¬ 
lision avoidance strategies based on visual information. We 
specifically tested cognitive models that invoke “regressive 
motion saliency” and “regressive motion as a cue for colli¬ 
sion” to understand how flies avoid colliding with each other 
in two-dimensional walks. We showed that it is possible to 
configure the experiment in such a manner that “regressive- 
collision-cue” (RCC) evolves as a strategy to avoid colli¬ 
sions. However, the conditions under which the RCC strat¬ 
egy evolved in our experiments are limited: the strategy only 
evolved in a narrow range of environmental conditions and 
even in those environments, it does not evolve all the time. 
In addition, we showed that from general principles, only a 
small percentage of events in which an agent perceives re¬ 
gressive optical flow eventually leads to a collision, so that 
RCC as a sole strategy is expected to have a large false pos¬ 
itive rate, leading to unnecessary stops. 

As discussed in the Methods section, our experimental 
implementation is biased in such a way that all regressive 
motion events lead to a collision if the agent does not stop 
during that event. If the moving object’s velocity direction 
is distributed uniformly randomly in all directions, the prob¬ 
ability that a regressive event ends up in a collision is rather 
low (« 20% in our implementations). Because the false pos¬ 
itive rate of using regressive optical flow as the only predic¬ 
tor of collisions is liable to thwart the evolution of an RCC 
strategy, we biased our setup in such a way that the false¬ 
positive rate is zero, a bias that does not significantly influ¬ 
ence the outcome of our experiments. Consider an environ¬ 
ment in which only a percentage of events with regressive 
motion end up in collision. This is similar to an environ¬ 
ment with a lower penalty for collisions (as long as the strat¬ 
egy evolves at all) since the agent’s fitness is scored at the 
end of its lifetime (all 100 events) not during each event. 

However, there is a difference between a lower percentage 
of collisions in regressive events and lower penalty for col¬ 
lisions, namely a lower probability of collision in regressive 
motion events is equivalent to a higher amount of noise in 
the cue that the agent takes from the environment, compared 
to the case of lower penalties for collision. In other words, 
if 100% of all regressive motion events lead to collisions, 
the agent associates regressive motion events with collisions 
with certainty. Thus, implementing the experiments with 
100% collisions in regressive motion events is tantamount 
to eliminating the noise in sensory information, which gen¬ 
erally aids evolution. Compensating for noise in sensory in¬ 
formation could also be achieved if we scored agents in ev¬ 
ery single event, and informed them about their performance 
in that event (feedback learning). We did not use feedback 
learning here, but plan to do so in future experiments. 

We conclude that the evolution of “regressive motion 
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Figure 10: RCC value distribution in environments with dif¬ 
ferent penalty-reward ratios. Each box-plot shows the RCC 
value averaged over the last 1000 generations on the line of 
descent for 20 replicates. 

saliency” is unlikely to have happened only due to colli¬ 
sion avoidance as the selective pressure. It is important to 
remember that walking is not the most frequent activity in 
fruit flies. Further, flies do not usually live in high density 
colonies and therefore do not find themselves on collision 
courses very often. It may be the case that components 
of this strategy (namely categorizing the optic flow as re¬ 
gressive or progressive) have evolved under different selec¬ 
tive pressures entirely unrelated to the present test situation, 
and was further evolved to enhance collision avoidance with 
conspecifics while moving (a type of exaptation). For exam¬ 
ple, detecting predators is a strong selective pressure in the 
evolution of visual motion detection, including the catego¬ 
rization of that cue so as to take appropriate actions. It may 
be interesting to study the behavior of flies interacting with 
animals or objects that are not perceived as conspecifics. 
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Abstract 

Eukaryo is a 3D, interactive simulation of a eukaryotic cell. 
In comparison to existing cell simulations, our model illus¬ 
trates the structures and processes within a biological cell 
with increased fidelity and a higher degree of real-time in¬ 
teractivity using a virtual reality environment. Implemented 
in a game engine, Eukaryo is a hybrid model that combines 
agent-based and mathematical modelling. 

Through the use of visual scripting, Eukaryo incorporates 
both agent-based modelling and mathematical representa¬ 
tions to describe gene expression, energy production and 
waste removal within the cell in a highly visual, interac¬ 
tive simulation environment. With the help of virtual real¬ 
ity displays, users can be immersed in the crowded spaces 
of biomolecular worlds and observe metabolic reactions at a 
high level of detail. Compared to traditional media, such as 
illustrations and videos, Eukaryo offers superior representa¬ 
tions of cellular architecture, its components and dynamics of 
the machineries of life. 


Motivation 

The concept of scaling humans to a molecular size to facil¬ 
itate exploration of biological systems at the cellular level 
forms the premise for the 1966 film ’’Fantastic Voyage” 
(Fleischer, 1966; Asimov, 1988). A group of scientists ex¬ 
plores the human body by means of a miniaturised sub¬ 
marine. If such journeys at molecular scale were feasi¬ 
ble today, they would offer unparalleled experiences to ex¬ 
plore the molecular universes of a biological cell. Illus¬ 
trations of biomolecular worlds have been limited to a few 
prominent examples only. David Goodsell’s ’’The Machin¬ 
ery of Life” is one such example (Goodsell, 2009). Metic¬ 
ulously constructed from X-ray crystallography, NMR and 
high-resolution electron micrographs, Goodsell’s illustra¬ 
tions capture the densely packed environments inside a cell 
at the molecular level. Contrary to impressions from typ¬ 
ical textbook illustrations, these cellular spaces are highly 
crowded. Accentuating the complexity of a cell, Goodsell 
suggests that every structure visible in his illustrations is 
likely supported and regulated by a myriad of other struc¬ 
tures that are not visible. 



Figure 1: A Generic Eukaryotic Cell. The major organelles 
of a eukaryotic cell are replicated in the Eukaryo model. 

Despite capturing the density of materials within a cell, 
renderings - even at the highly detailed level of Goodsell’s 
illustrations - remain static and cannot depict how each of 
the different structures do interact with one another. As a 
result, textbook illustrations tend to be supplemented with 
videos that can portray the progression of sophisticated cel¬ 
lular biochemical reactions and pathways. The Bio Visions 
video “Inner Life of a Cell” makes use of 3D computer ani¬ 
mation to represent some of the key processes that occur in 
a eukaryotic cell, ranging from gene expression to cellular 
transport (Harvard Bio Visions, 2007). Despite being more 
expressive, videos limit viewers to observing events from 
predetermined camera perspectives. Hence, videos do not 
permit exploration or interaction with the model. 

In order to provide an interactive, exploratory environ¬ 
ment that - to a certain degree of accuracy - captures the 
sense of complexity underlying the machinery of life, we 
have implemented Eukaryo , a virtual model of a generic eu¬ 
karyotic cell (Fig. 1). Built using the game development 
software Unreal Engine (Epic Games, 2015), our model 
strives to capture the key structures and functioning units 
of a cell, similar to Goodsell’s illustrations and the Bio Vi¬ 
sions animations. Furthermore, by utilizing virtual reality 
(VR) interfaces, Eukaryo enables users to interact with the 
simulation, navigate to different locations within the cell, 
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investigate its architecture, and explore molecular structures 
and metabolic pathways. 

Using virtual reality visualizations and by combining the 
interactivity of a video game and artistic renderings of cel¬ 
lular structures with accurate representations of proteins 
(downloaded from the Protein Database (Westbrook and 
Fitzgerald, 2003)), Eukaryo can provide a sense of dynam¬ 
ics and immersion reminiscent of “The Fantastic Voyage” 
(Fig. 2). Not only is this a novel and engaging learning 
approach, this experience also provides a more effective 
learning method through computational models (Laha et al., 
2014). 

Related Work 

While biological systems are still mostly studied in vivo , 
many processes often cannot occur in isolation from living 
systems or are too difficult to explore. Computer models 
overcome this constraint, allowing for individual processes 
to be observed in greater detail and providing solutions to¬ 
wards virtual experiments - ideally starting at the cellular 
level. 

Cells Constituting the foundational building block of any 
biological system, simulations of cells are of particular inter¬ 
est. Projects such as E-CELL (Tomita et al., 1999) and Vir¬ 
tual Cell (Loew and Schaff, 2001) provide the frameworks 
for modelling interconnected processes inside a biological 
cell. E-CELL can simulate signaling, cellular reactions and 
gene regulation, while Virtual Cell is intended to act as a 
more general platform for simulations of micro-biological 
systems. 

Both E-CELL and Virtual Cell produce numerical outputs 
only. Actual visualization or illustration of how the system 
evolves over time is left to other tools and the user’s imagi¬ 
nation. Moreover, these models are limited in their interac¬ 
tivity: while they are running, users cannot visually observe 
a process or pause a simulation to inspect its current state. 

More recent cell models have been built as sophisticated 
mathematical models for predictive simulations. For exam¬ 
ple, Karr et al.’s whole-cell model captures all known pro¬ 
cesses in the bacterium M. genitalium (Karr et al., 2012). 
While such models are powerful, they are also complex to 
set up. For instance, the whole-cell model requires twenty- 
eight separate modules running concurrently to represent 
one cell and is controlled by 1, 900 empirically-determined 
values as input parameters. In order to better manage such 
complex control structures, we utilize hierarchical visual 
scripting similar to pathway interaction diagrams. 

Molecular Dynamics. Unity Mol (Lv et al., 2013) and 
MolecularRift (Norrby, 2015) have been developed using 
the Unity game engine to visualize molecules (Unity Tech¬ 
nologies, 2012). MolecularRift works on the Oculus Rift 
(Oculus, 2015), whereas Unity Mol has some virtual reality 



Figure 2: The cellular space with control elements to adjust 
the level of detail and a minimap for location reference. 

support, such as for immersive CAVE visualizations. These 
applications further demonstrate the opportunities offered 
by game engines for biomolecular visualization and simu¬ 
lation. These systems also show the benefit of virtual real¬ 
ity for molecular visualization. However, they are focused 
on molecular dynamics and drug design applications rather 
than full cell simulations. 

Agent-based Modeling. Agent-based methods have been 
used as simulation techniques complementary to purely 
mathematical models (Haefner, 2005). For example, a gene 
regulatory model of the A-switch has been recreated in a 
3D, purely agent-driven simulation (Jacob et al., 2006). A 
classic gene regulation model studied in E. coli bacteria, 
the lactose operon, has been implemented to illustrate the 
protein interactions that determine gene expression (Jacob 
and Burleigh, 2004, 2006). In comparison to other simula¬ 
tors, LINDSAY Composer offers a 3D, interactive environ¬ 
ment where models can be directly constructed in virtual, 
3-dimensional spaces (Jacob et al., 2012). The feasibility of 
such agent-based models in game engine-inspired environ¬ 
ments has been demonstrated in an immune system simu¬ 
lation, purely based on interacting components (Sarpe and 
Jacob, 2013) and gene expression in E. coli , including tran¬ 
scription and translation (Esmaeili et al., 2015) as well as 
chemotaxis. 

Implementating a Eukaryotic cell 

Eukaryo is implemented in Unreal Engine (Epic Games, 
2015) as a hybrid, interactive 3D model that combines agent- 
based modeling with mathematical techniques (using differ¬ 
ential equations). Agent-based models (ABMs) simulate be¬ 
haviours by defining a set of interactions between agents in 
a system. We adopt a general approach to define an agent 
as (1) a set of situations an agent may be in, (2) its set of 
actions , (3) all of the possible combinations of its internal 
data and (4) a decision function that triggers an action based 
on the situation and internal data (Afsharch et al., 2006). An 
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Figure 3: Level setup in Eukaryo. The simulation begins 
in the cytosol (Cell level) from which users may explore or¬ 
ganelles such as Mitochondria and the Nucleus in greater 
detail, which are implemented as separate levels. 

ABM constitutes the set of all agents, where the total activ¬ 
ity of the agents in an environment forms the ABM’s over¬ 
all behaviour. As such, ABMs are suitable for describing 
biological systems because observations in biology can be 
translated into rules for agents within an ABM. Moreover, 
new information - such as higher levels of detail regard¬ 
ing biomolecular processes - can be incorporated into an 
ABM simply by adding new agents and/or rules (Jacob et al., 
2012). Compared to mathematical models, where equations 
define system behaviour, ABMs are more difficult to vali¬ 
date and require more computational power to run. How¬ 
ever, mathematical models are specific to the processes they 
formalize and tend to be harder to adapt for describing other 
systems (Haefner, 2005; Edelstein-Keshet, 1988). A hybrid 
model confers benefits of both agent-based and mathemat¬ 
ical modelling (Esmaeili et al., 2015), complementing one 
another to produce more powerful, flexible and extensible 
simulations, as we are about to demonstrate with the Eu¬ 
karyo model. 

Cell Universes as Game Levels 

In comparison to prokaryotic (i.e., bacterial) cells, eukary¬ 
otic cells are more complex, consisting of organelles that 
carry out specific functions (Figs. 1 and 2). Compartmen- 
talization allows the cell to regulate distinct environments 
(Jekely, 2007). In Eukaryo , the simulation consists of three 
interconnected environments implemented as (game) levels 
(Fig. 3). Each level contains a set of Actors, to represent 
the cell as a whole (Cell) and two organelles (Mitochondria 
and Nucleus). In Unreal Engine, an Actor 1 is a collection 
of components that define its location, size and appearance. 
An Actor is inert, unless their event graph is implemented 
to specify its attributes. The permissible values for these at¬ 
tributes is the set of possible actions available to the Actor 

! We use the term Actor, instead of agent, to refer to an actor 
entity within the Unreal Engine game programming system. 


Eukaryo 



ffluefrrn 


Figure 4: Modular Architecture of Eukaryo. The game in¬ 
stance stores the cell’s state. Different levels receive infor¬ 
mation from the game instance to determine their local en¬ 
vironment. Each level consists of a collection of Actors. 
Some Actors have meshes for graphical representation and 
Blueprints to specify their actions. 

and the conditions that result in a particular action. 

Actors can be affected by and affect their environment 
(Fig. 4): information applying to the entire level is stored 
within persistent data structures known as Game Instances, 
which enables users to transition between different levels to 
view cellular processes at different levels of detail (Fig. 3). 
Blueprint Actors can communicate with the Game Instances 
in a similar manner as they would with other Actors. Event 
graphs for each Blueprint Actor (Fig. 6 2 ) make use of events, 
special conditions tracked by the game engine, such as colli¬ 
sions and frame updates that serve as triggers for interactions 
among agents. 

Visual Scripting with Blueprints 

Biochemical reactions proceed after collisions that bring 
molecules and enzymes in close proximity to each other. 
This forms the basis for all reactions within a cell (Gold, 
2014). In a game engine, a biological reaction corresponds 
to a collision between two Actors that results in a state 
change in at least one Actor. While this makes it straight¬ 
forward to describe reactions in terms of collision events, 
actual biological reactions occur at very high rates deter¬ 
mined (and measured) by the densities of, e.g., substrate, 
enzyme and signal molecules in the cytosol. The concen¬ 
trations of these reactants are high enough in the cell such 
that the probability of two reactants colliding and reacting 
is also high (in humans, each cell may have upwards of 1.7 
billion proteins) (Milo, 2013). However, using a large num¬ 
ber of Actors to reproduce such a high-density, reactive en¬ 
vironment in a game engine is (currently) computationally 
unfeasible, especially if we want to execute our model on 
commodity computing devices. 

2 Details of this script are explained in a later section. 
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Figure 5: Layout for a Generic Blueprint. Every Blueprint 
in Eukaryo that an Actor uses follows a similar layout. Af¬ 
ter its variables are initialised, on every step in the simula¬ 
tion, the Actor will look for the nearest Actor A of type T. 
If ready for a reaction, both Actors will move towards each 
other. On a collision, they become activated and partake in 
the reaction. 

As such, Eukaryo makes use of an alternative mechanism 
to facilitate an accurate yet illustrative representation of bio¬ 
chemical reactions. In Unreal Engine, each Actor has a 
transform component that keeps track of its location within 
a level. Other Actors may access this value, allowing for 
an Actor to compute its distance relative to any other Actor. 
Using the Blueprint Function Library common to all Actors, 
one can calculate an Actor’s distance to all Actors of a cer¬ 
tain type and determine the closest agent (Fig. 5). We use 
this information in conjunction with the iTween animation 
plugin to smoothly translate Actors from one point to an¬ 
other (Therriault, 2016). Together, this allows for different 
Actors to be moved rapidly towards other Actors and initi¬ 
ate a reaction without having to wait for molecule agents to 
collide with one another at random. While this is not a true 
representation of how chemical reactions proceed in vivo , it 
allows for reactions to be replicated more readily for illus¬ 
trative purposes. The actual densities of highlighted proteins 
and other molecules are controlled by mathematical equa¬ 
tions and visualized by particle systems as described below. 

Case Studies 

Using a game level approach in combination with visual 
scripting turns out to be highly flexible in composing path¬ 
ways, including 3D visualizations, and enabled us to imple¬ 
ment a number of key cell metabolism processes within Eu- 



Figure 6: Event Graph for Glycogen Phosphorylase imple¬ 
mented as a Blueprint visual script. While all Actors use 
similar blueprints, their behaviours may be further speci¬ 
fied: for instance, in glycogen phosphorylase, the enzyme 
will only continue breaking down glycogen into G1P pro¬ 
vided that the glycogen chain has not been consumed. Af¬ 
ter a certain period of inactivity, the enzyme will be broken 
down. 

karyo. In this paper, we only describe mRNA transport, tran¬ 
scription, translation, glycogenolysis, and the effect of pH 
on carbonic anhydrase in greater detail. Yet, the underlying 
mechanisms have been applied to represent other processes 
within Eukaryo at similar levels of detail, such as peroxide 
decomposition and microtubule assembly. 

Case Study 1: Transcription and Translation 

Within the nucleus game level (Fig. 3), mRNA is represented 
by a single Actor with a preset number of static mesh com¬ 
ponents. An mRNA Actor spawned into the level makes its 
segments visible, one at a time, to mimic the synthesis of 
individual mRNA segments during transcription (Fig. 7a). 
Once all segments are visible, the entire strand detaches 
from RNA polymerase. Beginning with the THO complex 
(Fig. 7b), transport signals can now identify and bind to 
mRNA (Carmody and Wente, 2009). The THO Actor con¬ 
tinuously looks through the level for any nascent mRNA to 
bind to. When a binding partner is found, the actor is moved 
to the mRNA, where, on collision, THO is attached to it. 
Now with bound THO, mRNA is ready for a UAP56 signal 
protein to bind (Fig. 7c). Once UAP56 is bound, ALY binds 
in a similar fashion (Fig. 7d). When Nxfl-Nxtl binds, the 
other Actors are detached from the mRNA, which is then 
ready to be transported out of the nucleus and undergo addi¬ 
tional processing for translation (Fig. 7e). Once an mRNA 
actor is transported into the cytosol and tagged by Glel and 
Dbp5, it is ready for translation (Fig. If). 

Ribosomes carry out translation in the cell level. After a 
ribosome Actor finds an mRNA strand, it follows a spline 
along the mRNA and synthesizes nucleic acid chains that 
’’fold” into a protein once the end of the spline is reached. 
The mRNA Actors are animated to improve their visual im- 
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(a) mRNA transcription (b) THO binds 



(c) UAP56 recruited (d) ALY recruited 



(e) Nxfl-Nxtl signals export (f) Glel, Dbp5 binds 



(a) mRNA translation (b) Amino acids fold into PKA 



(c) PKA activates PK (d) PK activates GP 



(e) GP breaks glycogen to G1P (f) Conversion of G1P to G6P 


Figure 7: A step-by-step illustration of mRNA export. 


Figure 8: Translation and Glycogenolysis. 


pact within the simulation; its children mesh components are 
positioned along an animated spline. This spline also forms 
the path for the ribosome Actor to move along during trans¬ 
lation (Fig. 8a). Following translation, exosome complex 
Actors ’’degrade” the mRNA so that mRNA does not accu¬ 
mulate within the cell (Fig. 9). Exosome complexes are re¬ 
sponsible for mRNA degradation, removing inactive mRNA 
as a part of mRNA turnover (Makino et al., 2013). 

Each mRNA Actor possesses an attribute that specifies 
which protein the ribosome produces during translation. In 
the current version of Eukaryo , mRNA transported into 
the cytosol ecodes for adenylase cyclase, protein kinase 
A (PKA) and phosphorylase kinase, and signal proteins 
for glycogenolysis. Adenlyase cyclase converts adenosine 
monosphosphate (AMP) into its cyclyic form, cAMP, acti¬ 
vating PKA, which, in turn, activates phosphorylase kinase 
(Alberts et al., 2015). mRNA also codes for the glycogen 


phosphorylase enzyme, which is activated by phosphorylase 
kinase and carries out glycogen breakdown, as described in 
the next section. 

Case Study 2: Glycogenolysis 

As a second case study, we present the replication of the 
glycogenolysis pathway. After translation (Fig. 8a) and 
protein folding (Fig. 8b), the signal proteins adenylase cy¬ 
clase, protein kinase A, phosphorylase kinase and glyco¬ 
gen phosphorylase participate in the regulatory pathway that 
leads to the release of glucose-1-phosphate (G1P). We have 
implemented this pathway and describe it here in detail 
(Venkataraman and Luck, 1949). 

Protein kinase A is activated by cyclic AMP (cAMP, 
a second messenger synthesised by adenylase cyclase) 
(Lodish et al., 2012). Once activated, protein kinase A phos- 
phorylates phosphorylase kinase to stabilise it (Fig. 8c). In 
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Figure 9: Glutathione Peroxidase within the Cell Level. An 
exosome complex can be seen (in grey) left of glutathione 
peroxidase. 

turn, phosphorylase kinase activates glycogen phosphory- 
lase (Fig. 8d), which can begin cleaving glucose units from 
glycogen: G1P is formed (Fig. 8e). Phosphoglucomutase 
converts G1P into glucose-6-phosphate (G6P), a reactant in 
glycolysis (Fig. 8f). The visual scripts to control these reac¬ 
tions are depicted in Figure 6. 

Case Study 3: pH Effect on Carbonic Anhydrase 

It has been found that varying the pH level in a eukaryotic 
cell’s environment alters its enzymatic activity. This was 
also replicated in a mathematical model by Khalifah and Ed- 
sall (1972). We followed their reaction schema and imple¬ 
mented a C++ Blueprint Function that calculates the result¬ 
ing Michaelis-Menten (pK m ) and catalytic (k cat ) constants 
based on the pH value (Fig. 10). The Michaelis-Menten con¬ 
stant pK m is the substrate concentration that results in a re¬ 
action rate half of the maximum rate; inversely, pK m mea¬ 
sures a substrate’s affinity for an enzyme. A smaller pK m 
value corresponds to a reaction that approaches the maxi¬ 
mum rate of reaction more quickly. The catalytic constant, 
k^t, describes how fast an enzyme may react with substrates 
to form the product. Specific values for pK m and k cat can 
be calculated using rate constants as described in (DeVoe 
and Kistiakowsky, 1961). 

The Blueprint Function is attached to a level manager ac¬ 
tor that takes the environment pH to calculate pK m and k cat 
(Fig. 4). The values are sent to all of the carbonic anhy¬ 
drase Actors in the scene, thus controlling their rate of reac¬ 
tion. Each carbonic anhydrase Actor has an attached particle 
emitter which releases bicarbonate ions at a rate correspond¬ 
ing to the calculated reaction rate. 

As illustrated in Figure 10, we found that our model qual¬ 
itatively aligns with the values reported in (Khalifah and Ed- 
sall, 1972). The pK m and k cat values in our model suggest 
higher affinities and reaction turnovers. Further adjustments 
to the experimental parameters will need to be made to pro¬ 
vide more accurate representations. 




Figure 10: pH affecting carbonic anhydrase. Left: Khalifah 
and Edsall model; right: replicated Eukaryo model. This de¬ 
terministic model controls the visual effects and average car¬ 
bonic anhydrase agent concentrations (i.e., activity) in the 
simulated cell. 

The inclusion of a Michaelis-Menten model in Eukaryo il¬ 
lustrates how mathematical models can be incorporated into 
agent-driven models: functions that keep track of reaction 
rates and concentration changes are used to manipulate at¬ 
tributes of individual agents, rather than directly influenc¬ 
ing their decision functions. While our model is relatively 
simple, for now, this approach can be applied to more de¬ 
tailed models. For example, C++ libraries can be imported 
to construct functions that solve differential equations and 
generate biologically relevant outputs for predictive mod¬ 
elling, similar to those seen in programs such as Virtual Cell 
(Loew and Schaff, 2001) and other studies (Esmaeili et al., 
2015; Sarpe and Jacob, 2013). 

Virtual Cell Spaces 

In a gaming environment with immersive visualization and 
real-time interaction, one has to think about how to visual¬ 
ize the ’’cellular universes”, how to navigate through them 
and create convincing, yet scientifically justified effects to 
highlight the implemented biomolecular processes. 

Visualisations 

Similar to illustrations found in textbooks, organelles in Eu¬ 
karyo have been given specific colours to ensure that they 
are distinguishable from one another, while simultaneously 
providing some indication of their function (Fig. 1). The 
nucleus and endoplasmic reticula are purple for proximity 
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(a) High detail (b) Low detail 


Figure 11: Side-by-side comparison of the simulation at dif¬ 
ferent levels of detail, (a) All of the elements are visible to 
capture the crowded cell environment, (b) All game objects 
and particle systems disabled, allowing users to focus on the 
cell’s structures. 

to the cell’s core and involvement in biomolecule synthesis. 
The Golgi apparatus and vacuoles are green: both are in¬ 
volved in storage and transport. Mitochondria are orange, 
standing out from the other organelles. Together, they pro¬ 
vide the context for the processes described in this paper. 

Particle Systems. To convey that cellular spaces are 
highly packed, multiple particle systems are incorporated 
into Eukaryo. Typically used for representing transient en¬ 
tities like smoke and flames, particle systems can be config¬ 
ured to emit billboards, which are entities that always face 
the camera (Reeves, 1983). These billboards are used in 
Eukaryo to represent water molecules, sodium, potassium 
and chlorine ions, as well as albumin, fatty acids and car¬ 
bon dioxide (Fig. 11). By utilizing the graphics processing 
unit (GPU) for rendering, more than a million particles can 
be displayed on screen without compromising performance. 
This means that the simulation remains responsive to user 
input at any point. Each particle has a random velocity, an¬ 
gular motion and lifespan to convey a sense of stochasticity 
to mirror Brownian motion. 

Protein Data. In order to accurately represent each pro¬ 
tein in Eukaryo we have imported their 3D structure from the 
Protein Databank (PDB) repository (Westbrook and Fitzger¬ 
ald, 2003). A PDB file contains the coordinates, rota¬ 
tion and amino acid sequence in a protein. The result¬ 
ing structural information can be converted into surface 
meshes and imported into 3D modeling software (such as 
Autodesk(Autodesk)). Before import into the game engine, 
we apply textures to the mesh to improve a structure’s vis¬ 
ibility in the cell. Hence, we can present protein structures 
consistent with accurate, empirical data. 

Navigation and Level of Detail Controls 

In comparison with other virtual environments that depict 
entities such as buildings and terrain, traditional notions of 


up and down are not apparent within a cell as familiar ref¬ 
erence points are absent. This makes it more difficult for 
users entering the virtual cell spaces to orient themselves 
using visual cues present in the cell alone. Therefore, ad¬ 
ditional navigational aids were implemented. A minimap is 
displayed in the lower left-hand corner, where a cursor in¬ 
dicates the current camera position (Fig. 2). A slider allows 
the user to hide or show varying levels of detail (Fig. 11). 
At the maximum level, all proteins and molecules are vis¬ 
ible. A lower level of detail reduces the number of visible 
elements. Actors that are invisible still continue running in 
the background, so the simulation itself is never interrupted. 
Lastly, users can highlight and bring up information about 
any entity they are looking at or approaching within the cell. 

Conclusion 

We have described an implementation of Eukaryo , an inter¬ 
active, hybrid 3D model of a eukaryotic cell created in the 
Unreal Engine game environment. Using an agent-based ap¬ 
proach to describe biological processes as a series of shared 
interactions between Actors, our model becomes highly ver¬ 
satile. We have highlighted some of the pathways and pro¬ 
cesses implemented in Eukaryo : gene expression with tran¬ 
scription, translation, and mRNA transport; glycogenolysis 
and pH effects on carbonic anhydrase activity. Built on a 
game engine architecture, the Eukaryo system incorporates 
mathematical models as additional components, thus facili¬ 
tating and further extending the expressiveness and accuracy 
of the simulations. By combining state-of-the-art biological 
modelling with 3D visualization and real-time interactivity, 
Eukaryo provides an immersive cell model that can serve as 
a learning tool as well as an environment to perform virtual 
experiments. 

The Eukaryo software, videos and virtual experiments are 
available on the Lindsay Virtual Human website. 
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Abstract 

Thyroid cancer is a common endocrine system neoplasm 
characterized by being extremely heterogeneous and of un¬ 
explained incidence (idiopathic). Some subtypes of thyroid 
cancer are more aggressive than others and for this reason 
treatment needs to be differential. Nonetheless, due to its in¬ 
herent variability, prognosis based on pathology and/or bio¬ 
chemical profiling often fails leading to a delay in proper 
therapeutics that increases significantly the associated mor¬ 
tality. The most aggressive thyroid tumors are characterized 
by an increase in the destruction of extracellular matrix, this 
is done by the matrix metalloproteinases (MMPs). The reg¬ 
ulation of MMPs is finely tuned by several molecules, but 
the dynamical mechanisms which control this pathway are 
still unknown. Here, based on detailed molecular interaction 
information coming from functional tests and gene expres¬ 
sion experiments, we develop a boolean model of the matrix 
metalloproteinases pathway in thyroid cancer. By observing 
steady state conditions perturbing the network by simulating 
a specific drug, we find that TNFA could be a major target 
of this pathway. The approach performed here could allow to 
understand the finely regulated process to maintain extracel¬ 
lular matrix homeostasis. 

Introduction 

Thyroid carcinomas (TCs) are the most common endocrine- 
related cancers. In recent years, the incidence of TCs has 
drastically increased, at the same time mortality rates re¬ 
main largely unchanged (Davies and Welch, 2006; Veronese 
et al., 2015). Sub-optimal diagnostics have lead to thousands 
of TC-related deaths annually, in particular for the case 
of poorly-differentiated, anaplastic and medullary cancers 
whose ethiology remains to be fully disclosed (Giuffrida 
and Gharib, 2000; Hernandez-Lemus and Mejia, 2012). A 
number of environmental and nutritional factors have been 
statistically associated to TCs. It is believed that any pro¬ 
cess leading to compensatory increases in the hormone thy¬ 
rotropin will increase the risk of thyroid tumors. Apart from 
this, an elevated risk has been documented for women who 
use estrogen for gynecological reasons, especially those in 
pre-menopause stages (Ron et al., 1987; Hima and Sreeja, 
2015). 


One of the main challenges to thyroid neoplasms prognos¬ 
tics and therapeutics is the enormous variability that the tu¬ 
mors present both at the cellular morphology and at the gene 
expression levels (Espinal-Enriquez et al., 2015). Tumor 
heterogeneity generates two main classes of problems for 
clinicians: first of all the determination of the specific type 
of tumor since fine-needle aspiration cytology is conclusive 
in only around 20% of the cases. Secondly, determination 
of the most aggressive tumor subtypes (usually anaplastic 
and some papillary) to decide which patients are candidates 
for pharmacological therapy with close follow-up and which 
ones should undergo invasive and costly surgical procedures 
(Baudin and Schlumberger, 2007). 

It is known that thyroid carcinomas present quite complex 
patterns of evolution (Espinal-Enriquez et al., 2015). The 
entangled dynamics of these tumors is shaped by a non¬ 
trivial interplay of genomic and epigenomic changes includ¬ 
ing not only mutations and gene expression changes but 
also effects of copy number variations, gene translocations, 
methylation deregulation and altered signaling pathways. A 
detailed analysis of all the afore-mentioned features seems 
unattainable, however a complex systems approach based 
on integrating most of this information into a simplified 
dynamic model may allow us to have a global but coarse¬ 
grained view of the phenomenology behind thyroid cancer 
evolution. 

Histologically, thyroid carcinomas are classified into follicu¬ 
lar adenoma (FA), follicular thyroid cancer (FTC), papillary 
thyroid cancer (PTC) and anaplastic thyroid cancer (ATC). 
FTC and PTC are differentiated tumors with low risk of re¬ 
currence and good prognosis. On the other hand, ATC is 
more aggressive, usually diagnosed at an advanced stage, 
therefore frequently leading to a fatal prognosis (Baudin and 
Schlumberger, 2007). A transition from FTC to PTC and 
PTC to ATC has been reported depending on the differential 
expression of molecules related to degradation of Extracel¬ 
lular matrix (Espinal-Enriquez et al., 2015). 

A relevant feature present in the most aggressive thyroid car¬ 
cinoma and also a well known hallmark of cancer (Hanahan 
and Weinberg, 2000, 2011) is the invasiveness and migra- 
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tion of tumor cells via degradation of extracellular matrix 
(ECM) (Gu et al., 2005; Ii et al., 2006). The components 
of ECM, collagen, elastin and gelatin, are degraded by the 
matrix metalloproteinases (MMPs). The activity of MMPs 
are negatively controled by the tissue inhibitors of matrix 
metalloproteinases (TIMPs). So far, four TIMPs have been 
described and charaterized: TIMP-1, TIMP-2, TIMP-3 and 
TIMP-4. The interplay between MMPs and TIMPS in nor¬ 
mal conditions maintains the balance of extracellular matrix. 
However, in metastatic tumors, overexpression of MMPs 
falls into an exacerbated destruction of ECM and the con¬ 
sequent invasion of adjacent tissues Jacomasso et al. (2014). 
MMPs are also negatively regulated by other molecules, 
such as A2M, TSP2, TFPI2 and RECK (Krady et al., 2008; 
Oh et al., 2001). MMPs have positive regulation by TGFB 
(Kim et al., 2004), plasmin (Ramos-DeSimone et al., 1999) 
or furin (Remade et al., 2006). At the same time, other 
molecules participating in the matrix metalloproteinases 
pathway are alCT, alPI, a2AP, which regulate the levels of 
plasmin; ADAM 17, LRP1, and ELANE (Figure 1). 

The dynamics of this pathway is still unknown. Understand¬ 
ing the temporal behavior of all those elements in this path¬ 
way becomes crucial to have a better image of the ECM 
maintenance in cancer. To achieve this, to construct a model 
which captures the dynamics of the participants in the MMP 
pathway is necessary. Here we construct a Boolean network 
model of the MMPs pathway (Figure 1), based on the known 
relationships among components. We also use a previous 
work (Espinal-Enriquez et al., 2015) in which we propose 
a transition between thyroid carcinoma subtypes depending 
on the expression of the molecules involved in the MMP 
pathway. 

In the next section we will introduce the main theoretical 
tool we will use to tackle with this problem, namely a dis¬ 
crete dynamics model for the MMP network. Such class of 
models represent a dynamical systems approach to simulate 
the time evolution of cellular phenotypes constrained by a 
set of cellular biochemical processes subject to environmen¬ 
tal influences. After that, we will present the specific model 
of a Boolean dynamical network of thyroid cancer invasive¬ 
ness, whose main results, in particular those related with a 
classification of the more aggressive (and invasive) pheno¬ 
types for TC, will be discussed in the following section. Fi¬ 
nally we will further elaborate on the biomedical implica¬ 
tions of our findings and the usefulness of such dynamical 
systems approaches to understand complex diseases. 

Discrete network dynamics 

Several approaches have been developed to model the dy¬ 
namic evolution of complex systems based on partial knowl¬ 
edge about them. One particularly powerful yet somehow 
simple approach is given by sequential dynamical systems, 
in particular discrete dynamical networks (Kauffman, 1969; 
Aldana and Cluzel, 2003) also known as Boolean networks. 


Such systems possess mathematical features that may their 
implementation computational effective for medium-sized 
problems also providing stable results. At the same time, 
discrete dynamical networks allow for the codification of 
functional information (such as the one obtained by exper¬ 
imental data) in the form of logical rules (Zinovyev et al., 
2012) which can be curated and refined by a close interplay 
between experimentalists and theoreticians (Espinal et al., 
2011). In concrete, a Boolean model is constructed based 
on the assumption that any element in the network may have 
two possible dynamical states: active or inactive; open or 
closed; expressed or inhibited; etc. These values can be 
translated to 1 or 0. The dynamical state of any node will 
depend on the state of its regulators, which are other nodes 
in the network directly connected to it. 

In a formal description, the dynamical state of the network 
consists of a set of N discrete variables {ai, < 72 ,..., a n }, 
each representing the aforementioned dynamical state of the 
node. The state of each node cr n (0 or 1) is determined by 
its set of regulators. We denote as a ni , cr n2 ,..., cr nk the k 
regulators of a n . Then, at each time step the dynamical state 
of a n will be given by 

a n (t + 1) = F n (<r ni (t), a n2 a nk (t )), 

where F n is a regulatory function constructed by taking into 
account the activating/inhibiting nature of the regulators. 
Each node has its own regulatory function. It is worth notic¬ 
ing that the curation of the set of logical rules was firmly 
grounded on the data-driven analysis of whole-genome gene 
expression experiments (Espinal-Enriquez et al., 2015) via 
global statistics and causal network inference (Kramer et al., 
2014). 

Steady state conditions 

For Boolean networks (finite number of nodes which take a 
finite number of values), all initial conditions lead to a peri¬ 
odic behavior where the network configuration is replicated 
after a certain number of steps. This pattern of repeated net¬ 
work configurations is called an attractor. For the same net¬ 
work, several attractors may coexist. The total amount of 
initial conditions which reach a particular attractor is called 
the basin of attraction. The time required to reach this con¬ 
dition is known as the transient time and the number of it¬ 
erations between the repeated configurations is the period 
of the attractor. These values reflect global properties of 
the network. In this case the steady states of the network 
allow to us to identify the temporal behavior of particu¬ 
lar phenotypes, by observing the interplay between metallo¬ 
proteinases and their regulators. Furthermore, the Boolean 
approach that we have implemented is also capable to de¬ 
termine the steady state conditions after elimination of one 
node (in silico knock outs). This has been done to observe 
the global properties of the perturbed network simulating the 
action of a directed drug, the ultimate goal of this model. 
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Figure 1: The regulatory network for thyroid cancer invasiveness. A) This biochemical pathway takes place in the plasma 
membrane as well as in the extracellular space. Here, the main molecules involved in the pathway are depicted. B) The graph 
consistent with the minimal network representing the invasive phenotype response in thyroid cancer. 
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Figure 2: Network evolution pattern. Columns represent 
the states in terms of molecule activity: Green = active, 
Black = inactive or absent. Rows represent discrete time 
steps. Time runs from above to below. As it can be observed, 
after a small set of iterations, the network reaches a steady 
state condition (black and green squares do not change any¬ 
more). This is the attractor for that particular initial condi¬ 
tion (first row). 
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Figure 3: Dynamic landscape for different attractors. 

This representation shows the basin of attraction (fan-like 
structures). By observing from the outside of the figure 
to the center, each point of the fan-like structure represent 
one configuration of the network, meanwhile the lines corre¬ 
spond to time steps of the network dynamics. Two dots are 
connected if one of them is a successor of the other under 
the dynamics. In this sense, the fan-like structures are a set 
of network configurations which converge to one dynamical 
network state. Eventually, all configurations converge to an 
attractor. For panels A and B, the attractor is a single-point 
attractor, whereas Panels C and D present three-state attrac¬ 
tors. The colors of links are different for graphical purposes 
only. 


A Boolean dynamic network model for thyroid 
network invasiveness 

Gene regulatory networks have been used previously 
to characterize the complexity associated with thyroid 
cancer phenotypes, such as malignancy associated with 
mechanisms of cell death resistance (Hernandez-Lemus 
and Mejia, 2012). It has also been discussed the important 
role that extra-cellular matrix (ECM) maintenance and 
repair processes play in the development of invasive tumors 
which are the more aggressive forms of thyroid carcinomas 
(Espinal-Enriquez et al., 2015). Among such processes it 
was established that the regulation of ECM remodeling by 
the family of Matrix metalloproteinases (MMPs) and their 
inhibitors is of foremost importance. 

For this network, calculated over all initial conditions (2 24 ), 
the dynamics converges to 10 different attractors (Figure 3). 
Two out of the ten attractors are period three, meanwhile 
the other eight are punctual attractors. For clarity, figure 3 
contains only 4 attractors, two of period 1 (3A and 3B), and 
two of period 3 (3C and 3D). 


Elimination of nodes remarks the relevance of 
TIMP regulation 

As we mentioned before, the interplay between MMPs 
and their inhibitors TIMPs, is the most relevant feature 
to determine whether the ECM is compromised or not. 
To quantify this parameter, we construct an invasiveness 
score (/S) which consists on the ratio between MMPs and 
TIMPs state values once an attractor has been reached. 
We eliminate one node of the network for performing the 
dynamics and observe whether the IS grows or decreases. 
The IS after elimination of each node is presented in table 1. 

As it can be observed, the IS changes depending on which 
element was eliminated. Elimination of the node corre¬ 
sponding to TIMP-1 presents the largest IS, meanwhile the 
elimination of TNFA present the less aggressive situation. 
This last result is not intuitive, however, the network dy¬ 
namics shows this is the most important node to have less 
invasiveness. This could be relevant in the context of di¬ 
rected therapies targeting the nodes whose elimination cause 
a decreasing in the invasiveness. 
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Table 1: Invassiveness score IS after elimination of one 
node in the network dynamics. The value of the Wild type 
network IS (with all nodes present) is bold. An IS above the 
WT represent dynamics which are more invasive after elim¬ 
ination of that node, meanwhile lower IS means the deletion 
of the node decreases the invasiveness and concomitant de¬ 
struction of the ECM. It is worth to mention that the less 
aggressive dynamics is observed by eliminating the tumor 
necrosis factor a , (TNFA). 


knock-out node 

IS 

TIMP-1 

2.2 

TIMP-4 

2.2 

a2m 

1.333333333 

TIMP-2 

1.25 

TIMP-3 

1 

MMP9 

0.928571429 

LRP1 

0.882352941 

TSP2 

0.882352941 

ELANE 

0.85 

ADAM 17 

0.833333333 

MMP2 

0.777777778 

WT 

0.77173913 

PLG 

0.764705882 

MMPE 

0.764705882 

Plasmin 

0.764705882 

TFPI2 

0.764705882 

alCT 

0.764705882 

a2AP 

0.764705882 

MMP3 

0.764705882 

MT-MMP 

0.705882353 

RECK 

0.666666667 

Furin 

0.666666667 

MMP14 

0.6 

TGFB 

0.538461538 

alPI 

0.486486486 

TNFA 

0.384615385 


Dynamical regime of the network exhibits 
criticality 

Dynamical Regime Discrete networks can operate in 
three different dynamical regimes: ordered, critical and 


chaotic (Derrida and Weisbuch, 1986; Aldana and Cluzel, 
2003). These regimes are characterized by how perturba¬ 
tions are propagated across the network and also by how 
these perturbations modifies or not the network dynamical 
state. In the ordered regime, the network is not sensitive to 
perturbations. In the chaotic regime, a small perturbation 
often generates a perturbation avalanche that grows in time. 
However, in the critical regime, small perturbations neither 
increase nor decrease in time. A more profound description 
of the dynamical regime of discrete networks can be found 
in (Espinal et al., 2011). 

To determine the dynamical regime in which this network 
is operating, it can computed the Derrida map M (x). This 
mapping relates the size of the perturbation avalanche at 
two consecutive time steps: x(t + 1) = M(x(t)). It is 

, (the slope of this map at 

x = 0), determines the dynamical regime: ordered if S' < 1, 
chaotic if S > 1 and critical if S = 1 (Espinal et al., 2011; 
Balleza et al., 2008; Derrida and Weisbuch, 1986; Aldana 
and Cluzel, 2003). 

Quite remarkably, the Derrida map of the MMP network 
shows with high accuracy that this network operates in the 
critical regime. This is evident from Fig. 4, where it is 
shown that the slope at the origin is close to 1. Systems 
operating close to a critical point have remarkable proper¬ 
ties that would be very difficult to understand in the ab¬ 
sence of criticality. In particular, a property of regulatory 
networks operating close to criticality relevant to the present 
study, is the interplay between robustness and adaptability 
observed in this network. Maintenance of the homeosta¬ 
sis of ECM is a major issue in the cell, since the stability 
of the tissue structure is highly dependent of the status of 
the ECM. It is necessary to have the mechanisms to degrade 
ECM but, at the same time, with a strong and sophisticated 
mechanism for a correct negative feedback of them. The 
property of criticality remarks the fact that this network can 
evolve with those characteristics (robustness under pertur¬ 
bations and evolvability to change under certain conditions) 
with a high accuracy. Therefore, the critical dynamics re¬ 
vealed in figure 5 is indicative of an optimized mechanism 
of degradation and maintenance of extracellular matrix via 
the interplay of MMPs and TIMPs as well as their regula¬ 
tors. 

Perturbation of the network remarks loss of 
criticality 

As we mentioned in the previous section, in a Derrida plot, 
pairs of initial states are sampled at defined initial distances, 
H (0), from the entire state space, and their mean Hamming 
distance, H(t ), after a fixed time, t, is plotted against the 
initial distance H( 0). For this case, t = 1. The curve 
above/below the line (slope), H( 1) = H( 0), reflects insta¬ 
bility/stability, respectively (Kauffman, 1969). 


known that S = dM d ,^ 
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To investigate the significance of each node in the network 
we calculated the perturbation measure for the individual 
nodes, by modifying the Derrida plot. Perturbation calcula¬ 
tions were performed between the normal network and each 
of the 24 mutated networks. A mutated network for a spe¬ 
cific node contains forced (0) value for that specific node in 
the input and output states, therefore it contains 2 n_1 dy¬ 
namical states. For perturbation calculation of each indi¬ 
vidual node, we perform a modified Derrida plot by mea¬ 
suring This modified Derrida plot highlights the effect of a 
directed drug whose target is a particular node in the net¬ 
work. The perturbation calculations were normalized with 
respect to the slope, H(t) = H( 0). In figure 5 it is shown 
the modified Derrida map according to (Gupta et al., 2007; 
Espinal et al., 2011) in which after elimination of one node, 
the Derrida map is plotted and shows the dynamical regime 
of the network under a perturbation. The most important 
nodes to achieve the critical regime are those shown in the 
figure. Loss of them cause a chaotic behavior. As it can be 
observed, the nodes which maintain the dynamical regime 
are mainly the TIMPs and MMPs. Elimination of them in 
the pathway cause a disregulation which is also observed 
under cancer phenotypes. 



Figure 4: Critical dynamics for the MMPs signaling net¬ 
work. Plot of the Derrida map M{pc) which relates the size 
of the perturbation cascade at two consecutive time steps. 
The convergence of this mapping to a stationary value un¬ 
der successive iterations, determines the dynamical regime 
in which the network operates. This figure relates an initial 
separation x{t) against separation x{t + S(t)) 9 (5 = 1), av¬ 
eraged over all states which are initially separated by x{t). 
The slope of the curve near the origin is practically 1 in a 
sizeable neighborhood of the origin, an indicative that this 
network operates in the critical regime. 

Loss of criticality as a global property of the network may 
be indicative for a severe damage of the network which im¬ 
pedes the recovery from a perturbation or, on the other hand, 
loss of flexibility under certain conditions. Those nodes that 


cause the chaoticity are thus relevant in the context of global 
maintenance of the dynamical features of the network. 

Discussion 

Thyroid cancer is an important disease that involves sev¬ 
eral processes related to remodeling of extracellular matrix, 
which becomes in invasiveness and migration of tumor cells. 
The main mechanisms which govern the interplay between 
matrix metalloproteinases and TIMPs is not fully under¬ 
stood yet. In this work we constructed a Boolean network 
model of the MMPs pathway, we observed the steady state 
conditions for the wild type network as well as networks 
without one node, to simulate the action of a specific drug 
or a mutation in one of those elements. We also developed 
an invasiveness score IS, which reflects the ratio between 
action of TIMPs and MMPs once the steady state conditions 
have been reached. We observe the property of criticality 
in the WT network as well as the loss of this property af¬ 
ter elimination of some particular nodes, mainly TIMPs and 
MMPs. 



Figure 5: Derrida modified plots. Analogously to the Der¬ 
rida map shown in Figure 5, this modified map relates the 
size of perturbations at two consecutive time steps. Curves 
above/below zero (black horizontal line) reflect instabil¬ 
ity/stability, respectively. Curves above zero represent those 
networks whose the knock-out produces a chaotic regime. 
Curves close to zero remain in the critical regime after the 
knock-out of a node. Those curves which do not change 
their dynamical regime after node elimination, indicate that 
the deleted node is not relevant to maintain the dynamics. 
That is the reason for which it can be eliminated. The most 
relevant nodes to preserve the dynamics of the network are 
shown in the upper right part of the figure. 

Elimination of nodes in this work was implemented sys¬ 
tematically in order to find crucial elements for the progres- 
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sion of the disease. This implementation could be applied 
elsewhere in larger networks to find critical nodes which are 
capable of determine the behavior of the whole network. 

An important result is that the smallest IS was obtained after 
eliminating the Tumor necrosis factor a, TNFA. This result 
can be a promising therapy against the migration process 
that occurs during the most aggressive thyroid carcinomas. 
Therapies regarding blockage of TNFA has been developed 
for other pathologies, such as Reumathoid arthritis Bren¬ 
ner et al. (2015); Keffer et al. (1991). An opportunity to 
anti-TNF therapy could be opened with this study. Other 
approaches to find crucial nodes in a boolean network have 
been developed Kim et al. (2013). An interesting open ques¬ 
tion is whether the results observed here could be also ob¬ 
tained with other methodologies. That is matter of further 
research. 

To our knowledge, this is the first time that a discrete theo¬ 
retical model is implemented to understand the matrix met- 
alloproteinases pathway, and furthermore, the particular dy¬ 
namics of thyroid carcinoma progression. The finding of 
the TNFA as a crucial element for progression of this carci¬ 
noma could only be achieved with an approach such as the 
presented here. 

It is worth to mention that the criticality exhibited by the WT 
network is consistent with the fact that most biological net¬ 
works operate in this regime (Balleza et al., 2008; Shmule- 
vich et al., 2005). Moreover, the loss of the property after 
node elimination could be explained as a loss of the equilib¬ 
rium between those elements which degrade the ECM and 
those which maintain the basal levels of MMPs in order to 
preserve the homeostasis of Extracellular matrix. 

This kind of approaches give to us a more accurate insight 
of how the temporal behavior of any biochemical network 
can be observed. It is worth to mention that despite the ma¬ 
jority of reaction rates among the pathway elements are not 
known, the boolean modeling only needs the qualitative na¬ 
ture of the relationships. This is one of the greatest advan¬ 
tages of this coarse-grained approach. Experimental proce¬ 
dures must be performed to corroborate the results observed 
here. Notwithstanding, the boolean network developed in 
this work could suggest directed experiments in order to un¬ 
derstand the complex nature of the invasiveness on thyroid 
cancer. 
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Introduction 

Animal language may be regarded as a complex adaptive 
system. Although there is software for identifying linguis¬ 
tic units or “phrases” in animal vocalisations, packages for 
analysing the grammatical properties of phrases sequences 
are scarce. PajaroLoco is an open-source Mathematica 
package for the study of these features. This paper is a 
demonstration of the capabilites of PajaroLoco using as an 
example the syntax network from a Cassins Vireo individ¬ 
ual. Our intention is to illustrate how it could be used for 
other network analysis tasks in the artificial life community. 
It should be noted, though, that it is not our purpose to per¬ 
form a thorough description of the species grammar as we 
are currently working on other aspects of that field of study 
(Arriaga et al., 2015; Hedley, 2015). 

Bird song Analysis 

We will describe one song from a Cassins Vireo male, 
“Mine” (sample id: 1156), singing 559 phrases on 13th of 
May of 2013 (for a description of such songs and a link to 
recordings see Hedley (2015)). An excerpt of the annotated 
sample is: au, aj, ak, ai, da, cj, ch, ci ; where each pair 
of letters represents a distinct sound composition (these 
phrases were obtained a priori by analysing its spectrogram 
with different software) and the position of the phrases in 
the sequence denotes temporal relationships (au occurs 
before aj, ak before ai, etcetera). All of the routines that 
will be shown can be performed directly in PajaroLoco 
(Sanchez et al., 2015). 

Network Representation and Small-World Themes: 

The phrase sequences in the song were first represented 
as a network. This was achieved by setting the phrases 
as vertices in the graph and the transitions between them 
as weighted edges (an animation with sound of how this 
is done can be seen on the project’s youtube playlist: 
https://www.youtube.com/playlist? 
list=PLRzY6w7pvIWrQXIcnxN5tKtpIVX jhqOlu). 
In figure la we can see the resulting network. It is easy 
to observe the tendencies of certain phrases to appear in 
communities or “themes”. How tightly these themes are 




(a) Network representation of the transition frequencies 
of the song. Vertices are the phrases of the song and 
edges represent transitions. The coloured groups repre¬ 
sent the themes or communities of phrases. 


1 10 20 30 40 



1 10 20 30 40 


0 0.2 0.4 0.6 0.8 1.0 

(b) Markov transitions probability 
matrix. The numbers on the frame 
are the identifiers of the phrases 
and the transition probability is 
represented by a color scale. 

Figure 1: Markov and network representation of the song’s 
transitions. 
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(a) Degree Fre- (b) Probability distri- (c) Cumulative prob- 
quencies. bution. ability. 


Figure 2: Vertices degrees plots. 

connected can be measured with the “small-worldness” 
coefficient which in this case is: 3.23. A random graph 
would have a “small-worldness” of approximately 1 so the 
song of the bird is not structured in a completely random 
manner. Instead, phrases are organised into clusters in 
which a collection of phrases are often used together for a 
period of time. We also obtained the Markov transitions 
matrix (figure lb). In it we can see how probable is that the 
bird vocalises a certain phrase after he has produced another 
given one. We can observe once again that phrases follow 
a pattern and are not sung at random. Hedley et al. (2016) 
have analysed these data further, and suggested that the best 
representation is someplace between a first- and second- 
order Markov process, though some transitions may be 
more complex. 

Morphological Analysis: While most of the network 
analysis takes place in undirected networks some measures 
deal with the in and out-degree of the vertices. In this 
case they are grouped in patterns or “motifs” according 
to how they are connected to other phrases (bottlenecks, 
hourglasses, one-ways and branches (Sasaharaet al., 2012; 
Codyetal., 2015)). In our example we used the mean 
number of edges (3.05) in the network as the threshold for 
the detection of structures (this is a rather arbitrary way 
to define the threshold for demonstration purposes but the 
parameter can be easily changed if desired). Hourglasses 
and one-ways were the most common with 9 phrases while 
bottlenecks and branches were less frequent with 6 phrases. 
Degree Distribution: Another important feature to study 
in graphs is whether or not they conform to certain degree 
distribution patterns. Figure 2 shows three plots that were 
used for this purpose. We see in figure 2a that most phrases 
have few connections although we have another peak 
around seven connections hinting at a bi-modal distribution. 
This can be further observed in figure 2b. We can also 
see in figure 2c that most of the phrases (around 90%) fall 
between one and eight connections. 

Conclusions and Future Work 

Analysing animal language is important from a computa¬ 
tional and complex adaptive systems point of view. Tay¬ 
lor and Cody remark this importance by emphasising the 
different varieties in which bird’s vocalisations come and 
how they compare with other complex phenomenon such as 


cellular automata (Taylor and Cody, 2015). PajaroLoco is 
a tool developed for that purpose. Our program is part of 
an ongoing project and as such is updated and documented 
regularly. Although its main application is intended to be 
the analysis of annotated animal vocalisations, the package 
presented here can be used for the study of other complex 
adaptive phenomena in artificial life research, specially in 
those applications where phenomena can be described as se¬ 
quences of elements and are amenable to network analysis. 
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Abstract 

We identify some desired mathematical properties of bonds 
in an Artificial Chemistry (AChem) that promote complex¬ 
ity and open-ended behaviour (i.e. an AChem not designed 
to display particular behaviours). We identify the underly¬ 
ing structures created by different properties of mathematical 
products. We use these to exploit existing algebra to generate 
a potentially open-ended subsymbolic AChem (ssAChem). 
We give examples of how our approach leads to interesting 
behaviour, focused on the structure of composite particles 
within our system. 

A Low Level Approach to Artificial 
Chemistries 

Most Artificial Chemistries (AChems) seek to produce a 
system capable of displaying specific behaviours associated 
with abiogenesis, the transition from inorganic to organic 
(living) materials (Hutton, 2002; Lucht, 2012; Suzuki et al., 
2003). Those systems succeed in generating their partic¬ 
ular behaviours because that is what they are designed to 
do. Another approach is to consider that we are seeking 
open-ended behaviour in our systems. In order to design 
for open-ended behaviour we need to approach the problem 
in an open-ended way. 

We need to design a system that is rich and complex, with 
properties that allow us to define all the reactions of our 
AChem implicitly. We can then start looking for, and find¬ 
ing, behaviours that are emergent from the design, rather 
than engineered explicitly. We need a set of building blocks 
and connectors that do not limit the structure we design. 
Think of this as the difference between a prefabricated house 
and a brick house. A prefab has pieces that are specifically 
designed to fit together and form a house, and have a lim¬ 
ited capability to do anything else. A brick house is just the 
bricks and the mortar that joins them. The bricks are not 
limited to building a certain house, or even a house of a par¬ 
ticular size. With enough bricks and mortar the possibilities 
are endless. Likewise in an open-ended AChem the only 
limit should be the material and the amount of energy in the 
system. 


We need to consider desirable properties of the interac¬ 
tions of our particles, rather than of the whole system, while 
ensuring that we do not over- or under-constrain the AChem. 
Here we do this by taking a mathematical approach, and 
taking advantage of existing mathematical theory and struc¬ 
tures. This allows us to discuss not just the properties and 
behaviours of the particles, but also the different links and 
linking structures between them. We can then use estab¬ 
lished mathematics that has many emergent properties with 
interesting forms of interactions. We can also expand our 
view to talk about the effects of these properties on the sys¬ 
tem as a whole. 


Terminology 

Dittrich et al. (2001) define an AChem as a triple (S', R , A ), 
where S is the set of possible molecules, R the set of rules 
for binding molecules, and A an algorithm describing the 
dynamics of the environment. 

Rather than talking of ‘molecules’, we refer to the mem¬ 
bers of S as particles ; these are either atomic particles 
(atoms) or composite particles (composites). Rather than 
talking of ‘bonds’, we say that the rules R say how parti¬ 
cles can be joined together with links :; links can be broken to 
decompose composite particles. We use this terminology to 
help prevent confusion between the properties of real chem¬ 
ical molecules and our AChem particles, and to prevent the 
abuse of chemistry terminology. 

Faulconbridge et al. (2010) introduce the concept of sub- 
symbolic AChems (ssAChem), with an example based on 
RBN-world. Such AChems have an implicit rule set where 
the properties used by the rules emerge from the internal 
structure of the particles. RBN-world was further developed 
in (Faulconbridge et al., 2010; Faulconbridge, 2011). 

Here we demonstrate how the algebraic properties of the 
chosen rule set can be exploited to help obtain rich struc¬ 
tures, and demonstrate this with an ss AChem based on a 
Jordan algebra of Hermitian matrices. 
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Mathematical Properties and Structure of 
Composite Particles 

Our ssAChem rules have two parts: a set of mathemati¬ 
cal products (or mathematical operations) for forming links 
and composites, and a set of probabilities used to determine 
probability of a reaction. In this section we discuss the prop¬ 
erties of the mathematical product. 

In mathematics there are two properties of a product on a 
set that are easily defined, and that can be indicative of many 
further properties of an algebra. These are associativity and 
commutativity. 

Associativity: (a o b) o c = a o (b o c) (1) 
Commutativity: a o b = b o a (2) 

When we have a binary product, thereby linking two parti¬ 
cles, combinations of these properties lead to four distinct 
structures, Table 1. 

For an associative, commutative binary product we can 
change the order of evaluation and the ordering within any 
evaluation. No matter how we link a given set of particles, 
we get the same result. The structure is a bag. For an asso¬ 
ciative, non-commutative binary product we can change the 
order of evaluation such that there is no ordering on the prod¬ 
ucts, but we cannot change the ordering within the product; 
the structure is a string. 

Associativity is an assumed property of most algebras. 
Non-associative algebras, while rare, normally appear in an 
applied setting. They have been used in connection with 
genetics (Reed, 1997) and physics (McCrimmon, 1978) as 
well as a broad range of applications to mathematical theory 
(Gonzalez and Martinez, 2003). One of their main attrac¬ 
tions is that with their enforced evaluation order they can 
embody a loose form of time, or at least an ordering of in¬ 
teractions. 

For a non-associative, commutative product we can re¬ 
order particles in a product, but we have an enforced order 
of products. The structure is a binary tree, with unordered 
child nodes. For a non-associative, non-commutative prod¬ 
uct, we have an enforced order of products and ordering of 
particles within the products. The structure is a graph, with 
complicated directionality restrictions requiring labelling on 
both edges and nodes; these are not simple structures and do 
not conform to any of the normally used graph subtypes. 

Let us consider these four structures in terms of an 
AChem. A bag has no internal structure, and limits us to 
a set of composite particles with the cardinality of the power 
set of the component particles. In real chemistry there are 
isomers', molecules with different inherent properties de¬ 
spite containing the same atoms in different arrangements 
(Muller, 1994). Isomers add complexity and increase the 
size of the combinatorial space. An AChem with a bag struc¬ 
ture has no equivalent of isomers, so we do not want to base 
ours on an associative commutative product. 


Associativity 

Commutativity 

Structure 

Yes 

Yes 

Bag 

Yes 

No 

String 

No 

Yes 

Tree 

No 

No 

Graph 


Table 1: Summary of structure provided by different mathe¬ 
matical properties 

Strings are structures that have received a lot of atten¬ 
tion in the computing community, but they are rather simple 
mathematical objects that lack room for expansion. They 
have very simple combinatorial power of 


Strings support analogues of isomers, but there are not many 
of them. There is also no ordering of operations, so how they 
are formed does not affect the result. So we reject associa¬ 
tive non-commutative products. 

The tree structure given by the non-associative commuta¬ 
tive product not only has more room for expansion to larger 
trees, it also has an implicit ordering. Because we cannot 
change the order of operations we get a variety of structures, 
and a system in which structure is as important as the build¬ 
ing blocks themselves. This gives us a system with greater 
intrinsic flexibility. 

The graph structure of a non-associative non- 
commutative product provides yet more structure, but 
makes it hard for the product to have any regularity to 
exploit as it allows so many possible structures. It is not 
necessary that we work with a structure this complex so we 
stick to trees. 

Larger products 

We can look beyond binary products to products that take 
more arguments, combining multiple particles with a com¬ 
mon link. 

This does not affect the structure of the system if we have 
an associative non-commutative product: since it is associa¬ 
tive this changes nothing and we still have a string. However 
in the case of the non-associative commutative product as we 
expand from a binary product to a larger product we move 
from a binary tree to a general tree. 

There are a larger number of possible trees with n > 4 
leaves than strings with n > 4 elements. For n — 3 we have 
<§3 = 6 and ts = 4, where s n is the number of strings with 
n elements and t n is the number of trees with n leaves. For 
77 , = 4 we have 84 = 24 and £4 = 31 using products of any 
size, see Figure 1 . We can show that from this point onward 
there is a larger number of possible trees than strings. 

The number of possible strings increases with n such that 
<s n _i_i = s n (n-\- 1). For trees we have a faster growth. We can 
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Figure 1: Tree structures with four leaves with multipliers 
indicating the number of relevant rearrangements of leaves, 
giving an indication of all possible trees with four leaves 
with in this system. 

show that if we link the extra element to the result of each of 
the graphs with n nodes with a binary link then the new ele¬ 
ment can be swapped with any of the other elements to give 
at least t n+ \ > t n (n +1). We also always have more graphs 
as this does not include the graph of the (n+l)-product (see 
Figure 2) making t n +\ strictly greater than t n (n + 1). Thus 
as we have more trees at n = 4 and a faster growth in the 
trees than in the strings, for n > 4 we always have more 
possible trees than strings. 

In terms of an AChem, these properties show that we 
have a more interesting selection of possibilities in a non- 
associative system than otherwise, and these possibilities are 
controlled by the order in which reactions occur. Hence 
we focus on non-associative commutative products for our 
ssAChem design. 

Jordan Algebras 

Having established that these mathematical properties are 
desirable, we need to find a system in which we have these 
properties. Mathematics as a field has already found and 
studied systems with such properties, in the case of non- 
associative commutative systems we have Jordan Algebras 
(McCrimmon, 2006). 

Jordan Algebras were originally conceived to find a so¬ 
lution to describing observables in quantum mechanics, but 



n i ri 2 • • • rii 


Figure 2: Trees showing greater growth than strings 

were later discarded for that purpose because none of the 
Jordan Algebras were able to solve the problem. They have 
two important properties which define them: 

Jordan identity: (x • y) • x * 2 = x • (y • x * 2 ) (4) 

where x 9n = x • x • • • • • x (n times) 

Power associative: x' m x* n = x ^ m+n) Mm, n > 0 

(5) 

Power associativity tells us what happens when we work 
with just one kind of particle. 

There are several Jordan Algebras (McCrimmon, 2006). 
Here we take the most accessible Jordan Algebra that ex¬ 
ists over the Hermitian matrices (a matrix is Hermitian if it 
equals its Hermitian conjugate, see Equation 14). 

With this Jordan Algebra we start with a binary product 
formed of familiar matrix multiplication and addition to de¬ 
fine the Jordan product: 

X»Y :=\{XY+ YX) ( 6 ) 

As one can see X • Y = Y • X. It is also non-associative: 

(X»Y)»Z=±(XY + YX)»Z (7) 

= \ (XYZ + YXZ + ZXY + ZYX) (8) 
^ l(XYZ + XZY+ YZX + ZYX) (9) 
= \X» ( YZ + ZY) (10) 

= X»(Y»Z) (11) 

One of the advantages of a non-associative algebra is the 
ability to expand from the binary product and the binary tree 
it creates to a general product and its general tree. We can 
expand the binary product linearly to give the Jordan triple 
product: 

{X, Y, Zj = (X • (Y • Z) + (X »Y) • Z - (X • Z) • Y) 
= \(XYZ + ZYX) (12) 

We can further extend this to an arbitrary length n product, 
called an n-tad in Jordan theory (McCrimmon, 2006): 

{Xi,X 2 , ■■■ ,X n } = |(M 2 • • • X n + X n ■ ■ ■ X 2 X 1 ) 

(13) 
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Using the n-tad notation, (X • Y) = {X, Y}. 

Commutativity of this product means that we can fully re¬ 
verse the order of the elements in the product, but not freely 
rearrange the order completely. So there is a large number of 
possible n-tad products for a particular set of n objects, in¬ 
creasing our combinatorial power and the ability of our sys¬ 
tem to exploit some properties of composite particles. Thus 
Jordan Algebras equip us with products that are open-ended, 
and are applicable to the open set of Hermitian matrices. 

Mathematical Objects 

Other AChems have used ‘matrices’ as the basis of their set 
S. In particular, the binary string chemistry (Banzhaf, 1993), 
dubbed the matrix-multiplication chemistry by Dittrich et al. 
(2001), makes use of matrix multiplication. However, it 
does not treat its particles as mathematical objects; rather, 
it folds binary strings into a matrix in order to give a sim¬ 
pler definition of a function over the binary strings. This is 
common for the use of ‘matrices’ in systems that use ‘ma¬ 
trix’ to mean a two dimensional storage array rather than the 
mathematical object that we use here. 

All of the previous discussion in this paper has been build¬ 
ing towards creating a system that uses mathematical objects 
for both the particles and links of our system. This is the 
beauty of a mathematical product: it is in some ways an ob¬ 
ject with properties in its own right. 

Additionally, the matrices themselves are rich in emergent 
properties that might be exploited by our system. 

Hermitian Matrices and Subsymbolic Artificial 
Chemistries 

The atoms in the Jordan ssAChem used here are 3x3 Her¬ 
mitian matrices. 

Hermitian matrices use the Hermitian conjugate of a com¬ 
plex matrix: 

( an CL12 & 13 \ * (dll 021 « 3 l\ 

0*21 ^22 023 I = I O12 0^2 O32 J (14) 

031 032 033) yai 3 a 23 033) 

The elements a^ are complex numbers, and a is the complex 
conjugate of a. A matrix M is Hermitian if M = Aft. Her¬ 
mitian matrices are closed under the Jordan product (Mc- 
Crimmon, 2006). 

Hermitian matrices provide a rich variety of properties 
such that we can use them as prime material for creating a 
subsymbolic AChem (ssAChem) where emergent properties 
of the matrices dictate the linking capabilities/probabilities 
of a particle, and the algebra gives the structure of the com¬ 
posite particles. 

In this work we use the eigenstates of the Hermitian Ma¬ 
trices, chosen for their dimensionality and spatiality, and be¬ 
cause they are a well studied mathematical object. A fully 
worked example of linking, probabilities and strengths gen¬ 
erated using Hermitian matrices is given in the Appendix. 


Subsymbolic Link 

We make use of the eigenstate of the matrix to define linking 
probabilities. For a matrix M, consider 

Mv = /iv (15) 

The solution vectors Vi are the eigenvectors; the correspond¬ 
ing scalars pi are the eigenvalues. Here we choose these unit 
eigenvectors and the corresponding normalised eigenvalues 
A i as our emergent properties of interest to define our linking 
probabilities: 

A i = (!6) 

We normalise the eigenvalues to ensure sensible linking 
probabilities of larger composites. 

The probability of two particles A and B linking, based 
on a given pair of eigenvalues and eigenvectors, is defined 
to be: 

PAiBj =Af(\Ai - X Bj ) (l - \{{v Ai ■ V Bj ) + 1)) (17) 

This has two parts. The first term Af(\ a* — A# ) is the 
probability density of the normal distribution (p = 0, a = 1) 
at the point given by the difference in the normalised eigen¬ 
values. This means the probability of linking is larger for 
more similar normalised eigenvalues. The normal distribu¬ 
tion is not the only option; we simply need a symmetric dis¬ 
tribution centred on zero, and the normal distribution is a 
well-known such one. 

The second term (1 — • vbj) + 1)) uses the dot 

product between the corresponding unit eigenvectors. The 
dot product between two unit vectors is the cosine of the 
angle between them. The overall term has a value between 
0 and 1, and is 0 if the vectors are perfectly aligned and 1 if 
the vectors are anti-aligned. 

The probability of two particles linking, pab , is defined 
as the maximum probability of all the possible pairs: 

Pab = ma x{p Ai Bj Vi,j} (18) 

The other property we define from the eigenstate is the 
strength of the link (probability that the link does not de¬ 
compose). This is based solely on the difference in the nor¬ 
malised eigenvalues: 

Ia*b = Ai ~ A Bj) (19) 

Both these properties are based on the binary product. We 
define the linking probability of a triple link to be: 

Pacb = min {p A c, Pcb } (20) 

and we use the same set of eigenvalues to generate the links 
strength as the minimum of the strengths for each of the pairs 

[Aa p , AcJ and [A Cr , A S J: 

1{acb} = min{V(A^ p - Xc q ),M(\c r ~ AsJ} (21) 
This can be extended to the n-tad case in a similar fashion. 
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Composite Particles Probability 

In our system the links have properties of their own. Equa¬ 
tion 18 is the probability of the link forming given the pres¬ 
ence of its components and that we choose that many reac¬ 
tants. 

We need a further two probabilities to work out the proba¬ 
bility of the resulting composite of the link existing, /a. We 
need a probability of a particle existing, ca : 

{ 1 if ^4 is an atom 

( 22 ) 

/a if A is a composite 

And we need a probability that each particle takes part in 
the reaction. All reactions that form a link require us to se¬ 
lect at least two components. We define the probability of 
selecting further components for the reaction as 0.1 for each 
component. This choice discourages our system from con¬ 
stantly forming large links and quickly becoming one large 
composite. 

r {A 1 ,-~ ,A n } = 0.1” -2 (23) 

So together the probability of a composite forming in terms 
of its last link is: 


n 

fx = f{A ,A n } = Pxrx E eAi (24) 

i= 1 



X X X X 


4. 



X X 




Figure 3: The set of structural isomers of four identical 
atoms 


Isomer 

Probability 

Strength 

Maximum 

reaction 

size 

No. 

links 

11 

0.0020 

0.3989 

4 

1 

12 

0.0040 

0.1592 

3 

2 

13 

0.0040 

0.1592 

3 

2 

14 

0.0079 

0.0635 

2 

3 

15 

0.0079 

0.0635 

2 

3 


Composite Particles Strength 

Link strength in our system is truly a property of the link 
rather than the composite. The composite particle contains a 
series of links all with different properties; each link has the 
link strength given in Equation 19. The overall composite 
strength is given by the probability of each of the links not 
decomposing, and is the product of all the link strengths in 
the system. The result is that larger links are stronger than 
a series of smaller links as there is less chance for the com¬ 
posite to break down. This is given by: 

sx = JJ iy (25) 

ycx 

where ly is the link forming Y a sub-particle of X. 

Structure 

The Jordan Algebra underlying this system means that the 
structure of the composite is as important as the particles 
that make it up. Through this we can see that not only does 
the structure add to the properties of the composite, but also 
we can find behaviour in the structure independent of the 
particles. By this we find the analogue of an isomer from 
real chemistry. 


Table 2: Particle probability and particle strength for the iso¬ 
mers of the identity atom (isomers 1 to 5 with X=/). 

An Example of Emergent Richness 

Throughout we have talked about the richness of the Jordan 
Algebra basis for this system and how it allows us to create 
a system in which we can have meaningful isomers. We can 
see this most clearly when we consider the homomers (iso¬ 
mers containing only one kind of particle) generated by the 
identity matrix, /, which is also an identity under the Jordan 
product (and hence all these composites are also represented 
by/). 

There are five possible structural isomers using just four 
atoms / (Figure 3). Each of these has a probability of form¬ 
ing and a strength (Table 2). 

From these results we can see that in the simplest of cir¬ 
cumstances (when we are looking only at the structure of the 
isomer), structures with larger links are harder to form but 
once created are also harder to destroy. Thus we have emer- 
gently created a system that has stronger and weaker struc¬ 
tures depending on the way the composite particle forms. 

These multiple link isomers can form links with particles 
of different matrices, which have different eigenvalues. We 
give a second example of a base particle for which the five 
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Isomer 

Probability 

Strength 

Maximum 

reaction 

size 

No. 

links 

Ml 

0.0019 

0.3774 

4 

1 

M2 

0.0037 

0.1476 

3 

2 

M3 

0.0037 

0.1496 

3 

2 

M4 

0.0073 

0.0585 

2 

3 

M5 

0.0070 

0.0557 

2 

3 


Table 3: Particle probability and particle strength for the iso¬ 
mers of the M atom (isomers 1 to 5 with X=M ). 


resultant homomers behave differently. Consider the matrix 
M: 

f 1 i °\ 

M = \-i 1 0 (26) 

V 0 0 1/ 

The isomers formed from M all result in the same final 
matrix, as is true of all homomers due to the power associa¬ 
tivity law (Equation 5). Since all the component matrices 
have the same eigenvectors, we never have a product occur¬ 
ring with the same eigenvalue position, as they are perfectly 
aligned and so the linking probability is zero (Equation 17). 
This increases the strength and probability of larger links as 
all pairs in the link can form across the strongest link be¬ 
cause there is no case in which both possible link positions 
are occupied. 

These results show that isomers are all unique: the struc¬ 
ture is not defined by size or number of links (otherwise iso¬ 
mers M2, M3 or M4, M5 would be identical). This also 
shows that the structure has a strong effect on the system. 
The link properties are changing because we are no longer 
working with the identity matrix, and the composites are dif¬ 
ferent from M. Thus when we link with the larger composite 
particles, eg ((M • M) • M), we are in fact linking with a 
different set of eigenvalues (if not necessarily a different set 
of eigenvectors as discussed previously). This still gives us 
a pattern of decreased probability of creation and increased 
strength for a smaller number of larger links. 

We can also see that this behaviour does not indicate a uni¬ 
versal pattern of higher probability causing lower strength 
regardless of other properties. Isomer M4 has a higher prob¬ 
ability than isomer M5 but also has a higher strength. We 
can also see that these do not stretch across different ho¬ 
momers as I isomer II is more probable than M isomer Ml 
and is also stronger. This means that it is not the relationship 
between probability and strength that causes this behaviour, 
it is a relative effect caused by differently structured homo¬ 
mers of the same size. 

While the probabilities given for the existence of com¬ 
posites existing here are small and tending towards zero for 
larger composites we must remember that this system is in¬ 
tended to operate over a very large number of interactions. 


Thus while the probability of any particular composite ex¬ 
isting is small particularly for larger composites the chance 
of generating a (large) composite is relatively high given the 
number of possible (large) composites. 

Other Possible Behaviours 

This is not the only interesting behaviour we might find 
which results directly from non-associativity and a mathe¬ 
matical focus. 

Firstly we may consider looking at the isomers of our sys¬ 
tem in order to understand the more general behaviours of 
isomers. We can look at the probabilities of large molecules 
forming and their ability to act as information storage and 
transfer. We can be certain from the design of the system 
that the formation and replication of large composite parti¬ 
cles is possible. However we cannot be sure of how stable 
or regular these large composites would be. If they are not 
stable then they cannot act as an information storage and 
transfer mechanism as the data would have too high a prob¬ 
ability of corrupting. If they are not sufficiently regular then 
the information stored in them cannot be read in any useful 
manner. 

The concepts of catalysis and substitution are fairly well 
established behaviors looked for in AChems (Hutton, 2003; 
Faulconbridge et al., 2010; Hickinbotham et al., 2010; 
Suzuki et al., 2003). Many other systems implement these 
concepts in addition to their basic reaction mechanism. We 
could create additional capability in our system to enable 
catalysis and substitution, but it is not necessary. This is be¬ 
cause we have a probabilistic system. If we have two parti¬ 
cles, A and B , that have a low probability of linking then we 
can use a composite particle C to generate a larger particle 
((A • C) • B) (Figure 4). When we decompose in the correct 
manner this can leave us with a composite (A • B) whose 
total probability of forming is much higher than the origi¬ 
nal probability of A, B linking directly. It is even possible 
that this would allow objects with perfectly aligned eigen¬ 
vectors, which would normally have a linking probability of 
0, to connect and have a strong resultant link. 

Another well-established desirable behaviour is replica¬ 
tion. We have not eliminated self-replication. In this system 
it would look like a composite forming and then it being 
used much like C in Figure 4 to help the formation of an 
identical composite. Interesting instances of this would be 
cases in which the copy has a higher probability of exist¬ 
ing than the original. This would mean that the composite 
encourages the creation of copies of itself. 

Considering the analogy to real chemistry, we will at 
some point want to consider adding a temperature analogue 
to our system. This should modify the probability of linking 
in the system. In this case it could modify either the eigen¬ 
values or the eigenvectors of the matrix. There are very few 
ways to do this that do not effect both of these, so as well 
as changing the probability of linking we would be caus- 
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PAB = 0.01 ==> fAB = 0.01 


A B 


Pac = 0.4 


A C 



A C 



f(AC)B — 0.16 


f(A)B — 0.032 


A B 


Figure 4: A general example of what the concept of catalysis 
would look like in our system where dc is the probability of 
C decomposing in the correct manner. 


ing the composite to “rotate”. The structure of the matrices 
provide for a direct connection between an analogue of ther¬ 
mal energy and its effects on the energetic states (such as 
rotational or kinetic energy) of our composite particles. In 
addition, in our system variation of temperature should lead 
to increasing and decreasing probabilities, similar to the ef¬ 
fects in ensembles in real chemistry as described by statisti¬ 
cal thermodynamics. 


Summary and Conclusions 

Creating an AChem using a mathematical basis such that 
links and ordering exists gives us a chance to exploit the 
open set of Hermitian matrices and their already well- 
studied emergent properties. It also provides us with struc¬ 
ture that exists and is capable of displaying emergent be¬ 
haviours that cannot be predicted purely from the mathe¬ 
matical roots of the system. 

We have shown that even within the power associativity 
provided by the Jordan Algebra the fact that the underlying 
system is non-associative allows for these strong structural 
influences to become more prevalent and vary the system. 
We have also discussed that this may not remain the case 
when expanded to general isomers but will remain true in 
homomers. 

We have demonstrated that we can find interesting impor¬ 
tant behaviors in this sort of ssAChem that were in no way 
designed into our system. This suggests that these systems 
have much greater potential in that they are not being lim¬ 
ited by the intentions and goals of their creators. In terms 
of our starting metaphor, we are starting to create bricks and 
mortar, rather than a prefabricated AChem. 
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Appendix: worked example 
Linking Hermitian Matrices 

Consider the three Hermitian matrices: 

/ 1 -i i\ (l 1 1 

X = \ i 0 0 Y = 1 1 1 

\—i 0 0 / \1 1 1 

/-l i 0 \ 

Z = \-i -1 0 (27) 

\ 0 0 0 / 

These have the following eigenvector matrices and eigenval¬ 
ues: 


/ 0.58i 0 0.82 \ 

X : ^ = 0.58 0.71 -0.41 A x = (-1 0 2 ) 

y—0.58 0.71 0.41 ) 

(28) 


/ 0.41 0.71 0.58\ 

Y : v Y = 0.41 -0.71 0.58 \ X Y = (0 0 3) 

y—0.82 0 0.58/ 

(29) 


/—0.71i 

Z : v z = 0.71 

V o 


—0.71* 

0.71 

0 


°\ 

Oj A z = (-2 


0 0 ) 


(30) 

We can form three links over these matrices, using the rele¬ 
vant products: 


/ 1 \ - \i \ + 5 A 

1. (X»Y) - ( | + \i 0 1 * | 

vl-if -u o ) 

( -1 u -i~M\ 

2. {X,Y,Z}= -1 i -1 l-li) 

\—§ + \i \ + \i 0 / 

l -I -I + k -i-iA 

3. ((X.Y).Z)= \ ~h-ii H i-iM 

V-l + i + 0 y 


Link Properties 

For each of these links we can calculate the probability of 
the link, L , forming, pl , and the strength of the link, II. In 
order to calculate the probability of L we need to calculate 
the probability using each possible choice of eigenvalues. 
Taking link 1 we have the probabilities as given in Table 4 
the maximum of which occurs using the second eigenvalues 
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p 


*2 

*3 

Zi 

z 2 

^3 

Y 

0.035 

0.257 

0.041 

0.086 

0.142 

0.362 

y 2 

0.170 

0.299 

0.019 

0.182 

0.299 

0.200 

y 3 

0.027 

0.242 

0.121 

0.118 

0.072 

0.051 

Table 4: Probability of linking with Y for X and Z for each 

eigenvalue 







P 

Zi 

^2 

^3 



( x • 

yh 

0.0270 

0.1210 

0.0354 



(X. 

Y) 2 

0.0782 

0.2700 

0.0997 



(x • 

Us 

0.1638 

0.0175 

0.0135 



Table 5: Probability of (X • Y) linking with Z for each 
eigenvalue 


of X and Y, this gives us a probability pxy = 0.2992 and 
a strength of Ixy = 0.3989. These along with the strength 
and probability of the other two links are summarized in Ta¬ 
ble 6 . 

For link 2 we take the minimum of the probabilities pxy 
and Pyz- The probabilities for pyz are also given in Table 4 
and the maximum occurs with Y’ s first eigenvalue and Z’ s 
third eigenvalue. This gives pyz = 0.3623 which has a 
strength of lyz = 0.3989. We then have that the overall 
Pxyz = 0.2992 and it has strength Ixyz = 0.3989. 

For link 3 we need to work out a third set of probabilities 
between (XmY) and Z, these will be based on the eigenstate 
ofXmY (Table 5): 

/ 0 -0.5-0.5i 0.5 + 0.5A 

(XmY) : V {x .y) = -0.71i 0.5i 0.5i 

\ 0.71 0.5 0.5 ) 

\xmY) = ( — 1 0 2) (31) 

This means the largest probability comes from using the 
second eigenvalues of each component and so P(xy)z = 
0.2700 which makes the strength 1(xy)z = 0.3989. 

Composite Properties 

We next consider the / and s values for the composite re¬ 
sulting from each link. The first link has only atoms and 
is binary so fxY = Pxy and since there is only one link 
the strengths are also the same: sxy = Ixy- Similarly 
sxyz = Ixyz as there is only one link but fxYZ ^ Pxyz 
as the size of the link reduces the probability to fxYZ = 
0.1 Pxyz — 0.0299. 

Finally we have the probability of the third link: 
f(XY)z = PxyP(xy)z = 0.0808. The base components 
are atoms and the links are all binary so it is simply the 
product of the probabilities of both links. The strength 
is the product of the strengths of each link: S( X y)z = 
Ixy\xy)z = 0.1591. 


Link Probability Strength Composite Composite 

Probability Strength 

1 0.2992 0.3989 0.2992 0.3989 

2 0.2992 0.3989 0.0299 0.3989 

3 0.2700 0.3989 0.0808 0.1591 


Table 6 : Summary of links probabilities and strengths 
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Abstract 

Artificial chemistries (AC) are useful tools and a simple 
shortcut for the study of artificial life. In many works, AC’s 
are quite straightforward or simplistic or highly unrealistic (or 
all combined) but in several works AC are extremely com¬ 
plex. Among them, we focus of Hutton Artificial Chem¬ 
istry HuAC where reactions act on the nodes of a graph (so- 
called the atoms) where the connected components composed 
the actual molecules of the environment. The main works 
from Hutton are based on a 2D simulator (squirm) with auto¬ 
replication and several other properties. This paper proposes 
a computation framework and software that cancel the need 
for 2d space simulation in the HuAC while keeping a lot 
of the features of this chemistry. It relies on the Stochas¬ 
tic Simulation Algorithm that has been here adapted to work 
on graph structure. In order to test it, we simulated Hutton’s 
auto-replication - which relies heavily on strong spatial inter¬ 
actions - in a spaceless environment. In addition, due to the 
increase in performance, we develop some preliminary work 
on Random Chemical Worlds where reactions are randomly 
selected. We showed on simple metrics that the fraction of 
reactions among all possible is a general parameter that acts 
on the system similarly to a phase transition. 

Introduction 

Most of the artificial life questions and problematic revolve 
around understanding and providing clues to the origin of 
life and evolution of organism starting from scratch (Hut¬ 
ton (2003)). Because of the difficulties of real-life (or ’wet’) 
experiments to address these questions in ’real’ life, simu¬ 
lations and artificial modelling seems to be a most adequate 
tools (Dorin and Korb (2007)). Unfortunately, reproducing 
completely life-like systems are still not an option for un¬ 
derstanding general features used by living system to adapt 
and develop. Few artificial systems can be designed without 
the need to fix rules between building blocks components of 
living organisms. It usually necessitates to design an artifi¬ 
cial chemistry scheme (Dittrich et al. (2001)) and this need 
extends obviously to problems on artificial life and evolu¬ 
tionary strategies. Indeed put it simply, since real life rests 
on chemistry artificial life should rely on artificial chemistry 
(Suzuki and Dittrich (2009)). 


What is done (usually) is that first most chemistry is 
prescribed: it has small dimension (small # of reactions), 
straightforward that is the chemistry graph is either ex¬ 
tremely simplistic or small (Knibbe and Parsons (2014)) and 
it is somewhat unrealistic for example most transition ener¬ 
gies are ignored. Several AC have been recently developed 
to tackle specifically this issue of energy transition (Benko 
et al. (2005); Ducharme et al. (2012); Benko et al. (2009)). 
Additionally, problems like mass conservation can arise e.g. 
A -\- B C and C A. It is expected that any evolvable 
scheme will exploit this kind of easy shortcut. 

So what properties an AC should possess for more gen¬ 
eral life-like and evolution-based framework experiments? 
AC must be complex, rich and generic. More precisely, AC 
should be large and have a huge number of reactions. Sec¬ 
ond, and this is related to energy transitions, all reactions 
should not be possible. This amounts to require that com¬ 
ponents (molecules) cannot be reactive with all others. Ob¬ 
viously, mass conservation is required. This problem usu¬ 
ally arises when the chemistry of the system is procedurally 
generated - for example using artificial genome (Rocabert 
et al. (2015)). In other words, reversibility should not be 
hacked. Finally, AC should allow some form of open ended- 
ness: we don’t know all reactions/molecules (Lenaerts and 
Bersini (2009)). 

Several frameworks have been published that address 
some of the properties described above Tominaga et al. 
(2009); Oohashi et al. (2009)). 

One of them in particular, Hutton’s Artificial Chemistry 
(HuAC) own several of these properties (Hutton (2007, 
2004, 2002)). The central feature (cf Model) is to describe 
the chemistry as reactions between atoms with a fixed type 
and changing state while describing molecules as connected 
graphs of atom. Note that the term atoms refers to the small¬ 
est structure in the system and can describe structures - or 
domains - that are larger than actual atoms. The chemistry 
is a set of reaction on pairs of atoms and strikingly not on 
molecules. Like in the classical sense, these atoms need to 
make an encounter to react with each others (as a classical 
bimolecular reaction). However, the originality of HuAC 
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relies on so-called conformation reactions: reaction that oc¬ 
curs between bounded atoms. HuAC possesses all the re¬ 
quired elements to be of a wider use. It has some element 
of open-endedness but part of the chemistry reaction set has 
been finely tuned to obtain the expected outcome. 

Understandably, hand designing was part of the proof of 
concept of the chemistry (mainly to display self-replication) 
but lacks of generalisations. The simple questions what are 
the ’best’ set of chemistry rules to obtain a life-like chemical 
system where for example artificial evolution can be tested 
is still open. Also overall, the main drawback is that it is in 
essence a 2D construction with - without any explicit men¬ 
tion to it - a strong diffusion-limited component. 

This is a drawback for two reasons: 2D systems are nice 
but long to simulate - most of the time time between reac¬ 
tions is juste to simulate diffusion. Also, 2D simulations are 
simple to simulate but introduce unnecessary topology con¬ 
straints like chirality that cannot be overcome without strong 
tuning. Also, as we will see more in details afterward, all 
the tuning relied on strong spatial assumptions: correlation 
of positions ignoring mixing effects. Finally, and to for sake 
of completeness, HuAC does not introduce really reaction 
rates in a biochemical sense: bimolecular reactions occurred 
immediately upon collision - and conformation reaction oc¬ 
curred instantly. Thus, there were absolutely no notion of 
reactions rates (and affinity etc...). 

The aim of this article is to provide a framework - a 
stochastic simulator that simulate Hutton’s artificial chem¬ 
istry with several properties. First it will describe this chem¬ 
istry in a well stirred medium (which amounts to ’infinite 
diffusion’) using stochastic methods known as the Gillespie 
algorithm. This algorithm will be modified to suit Hutton’s 
artificial chemistry that keep track of graphs of atoms and 
not atom abundance only. Our framework also introduce re¬ 
actions rate and other reactions that were not included in the 
descriptions of Hutton’s original work. Also we will exam¬ 
ine the difference with HuAC’s dependence on 2D structural 
constraints and specially the impact of spatial correlation 
for the main Hutton’s algorithm: auto-replication. Finally 
since this framework can handle a huge number of reactions 
and/or molecules we will present preliminary works on ran¬ 
domly generated Artificial Chemistry and study their prop¬ 
erties on simple metrics. 

Model: Hutton’s Artificial Chemistry 

We start this section with a brief refresher on Hutton’s chem¬ 
istry (HuAC). HAC has been published in several papers 
(Hutton (2009, 2007, 2004, 2002)) but to our knowledge 
only one follow-up (Lucht (2012)). The main advantage of 
HuAC is that the description is really straightforward. The 
chemistry’s rules are very simple. However, it can grow ex¬ 
tremely complex due its graph structure. Briefly, Hutton’s 
chemistry is composed of molecules that are graphs of atom 
(in Hutton’s words). The atom term do not encompass actual 


atomic structure but merely describe the smallest compound 
in the system. As described, atoms have a type and a state 
that are both integers and molecule are simply connected 
graphs whose nodes are atom and with connections (bonds). 
Atoms have a fixed type and a changing state that are both 
integers and described by a pair (t\s) with t, s G N (note: 
in Hutton’s papers type is a letter and state is an integer e.g 
a0, eS ...). Any atom can have any numbers of connections. 
Therefore the chemistry is composed of fully connected sub¬ 
graphs which is the set of molecules. The chemistry relies 
on a physical simulator in 2D. All atoms are spatially re¬ 
solved and each atom has its own id and position. Atoms 
are hard spheres (of equal size) that undergo some kind of 
brownian motion in viscous environment and links are coded 
using springs —k(p — q). 

Reactions are based only on atoms and are of the form: 

(*i|si)(-I+)(*2|s 2 ) ~^ (G | S3) (. | H - )(G|£4) (1) 

where . design a link i.e one atom of (G|si) and one of 
(Gl^) must be linked for the reaction to occur as they de¬ 
sign a conformational change in the pair. Also + design an 
encounter reaction (therefore (G|si) and (G|s2) must not 
be linked together and the reaction occurs whenever the two 
atoms collide (whether they are otherwise linked to other 
atoms). Note that one way to ensure mass conservation is 
to never allow the type to be changed in the equations. All 
reactions occur locally so other links are not modified (they 
can be later if there is a reaction matching the new links 
in the graph). In the original papers, conformationnal reac¬ 
tion noted . is performed instantaneously and when several 
are possible they ’’are chosen at random” - presumably with 
uniform distribution. 

This chemistry is extremely general and encompass very 
easily several classical equations such an enzyme-substrate- 
product S + E^±C^P + E using 


Rl: 

so + eo ~^ s\e\ 

R2: 

siei —82 + Co 

R3: 

siei —>• 8q + eo 


and also the so-called Hit & Run reactions : 

SpE^C with C + A C + B: 

R4: <§0 + Co -)■ s\e\ 

R5: e\ H- bo —y 81 b\ 

R 6 : —y 80 eo 

note that in the last equation the complex the atom e\ is 
the functional equivalent of the complex C. Technically any 
e\ in the reactor can react with a bo to yield a b \. 

However, due to its atom/graph organisation, this chem¬ 
istry can quickly give rise to complex structure. For example 
the simple reactions: 


R7: 

u0 -(- a\ 1 y al.a2 

R8: 

u0 + ol\ 1 —y cl2.cl\ 
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(where the only difference is a swap between target states) 
for a given number of aO’s and al’s in the reactor yields two 
different graph structures as shown on Fig. 1. 



Figure 1: Examples of graph structure obtained using only 
R7 (top) and R8 (bottom) starting with 4 aO and one (1) a\. 

Hutton’s original papers also come with an auto replica¬ 
tion scheme where a given arbitrary single strand molecule 
eg.x \... x\.fi can automatically be duplicated using a 
given set of reactions: 


R9: 

e8 + eO —y e4e3 

R10 

xAyl —>> x2yb 

Rll 

xb + x9 —»• x7x6 

R12 

x3 + y6 x2y3 

R13 

x7y3 xAy3 

R14 

/4/3 ^ /8 + /8 

R15 

x2y8 —>> x9y 1 

R16 

x9y9 xS + yS 


Here is introduced the wild-card reaction using x and y. For 
example, xAyl refers to linked atoms of any type but in state 
4 and 1 whereas x4 + x9 refers to the collision of two atoms 
of the same type (but of any type) in state 4 and 0. One can 
immediately see that this does lead to a replication of any se¬ 
quence eg .x\ ... x\ ./i. Elements are added and linked using 
reaction Rll and the splicing is initiated by R14 and propa¬ 
gated via R15/R16 (see 2 for a walkthrough). 

This AC has several wanted features for artificial chem¬ 
istry. It is very general and allows for very complex feature 
emergence (as attested by Fig. 1). Namely, molecules can 
be very complex. Also this comes with default embedded 
mass conservation. Due to a complicated graph geometry 
with 2D features, there are some chirality issues (due to 2D) 
and of course this AC is computationally demanding. Also 
as mentioned, from a chemical standpoint, there is no reac¬ 
tion rates. 

The final remark is that most results that have been pub¬ 
lished on this AC has used hand-crafted and well designed 
set of reactions. In particular, there was no clear thought 
process described that explained how the replication sys¬ 
tem could work. It seems to us that it was drawn by hands 
and graph evolution was constructed with implicit bias due 
to proximity. However cells and more generally biological 
medium are disordered and can be highly diffusive. And of 
course processes takes places, mostly, in three dimensions. 



Figure 2: Walkthrough of the replication R9-R16 with an 
initial seed of e8al/l reproduced from Hutton (2002) 


The very complexity of the HuAC lead us to design a sys¬ 
tem that would be more tractable and keep the essence of 
the original chemistry. We are also able to experiments with 
a large number of particles and compounds and deal with a 
large number of possible reactions to obtain more reliable 
results on the potential of HuAC. 


STAARC: STochastic Atom-based ARtificial 
Chemistry 


We developed the STAARC framework to use the HuAC in 
a virtual reactor that completely eliminate spatial location. 
Since due to the particle-based nature of HuAC and its graph 
structure, reactions are identified and occur concurrently but 
asynchronously at a given speed. Therefore, the ODE for¬ 
malism seems not realistically tractable for this problem. 
This framework is not based on differential equations but on 
the Stochastic Simulation Algorithm (SSA) as it was origi¬ 
nality developed by Gillespie Gillespie et al. (2013); Gille¬ 
spie (2007). 
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Stochastic Simulation Algorithm 

The important feature of the Gillespie algorithm is that we 
simulate the world only at times when a reaction occurs and 
not in between. Contrary to ODE formalism where descrip¬ 
tion can occur for arbitrarily small time step, in SSA, all that 
is needed is to estimate which will be the next reaction, when 
it will occur and to actually implement the reaction. Since 
both the time and the reaction will be drawn according to 
a given diffusion, it is a simple mean to obtain noise and 
variability on molecular reactions. Note that, however, both 
the ODE and its SSA counterpart describe the same system 
with the same hypotheses and therefore yield the same re¬ 
sults: the ODE being the description of the dynamics of the 
average. 

To function, Gillespie algorithm introduces propensities 

di : the average rate at which the reaction i can occur in the 

k 

medium. For a bimolecular reaction X + Y — » this rate is 
equal to kxy when x and y are the number of molecule X 
and Y respectively in the reactor. Similarly, unimolecular 

k 

reaction X —» propensity is kX. Note that if X = Y i.e bi¬ 
molecular reaction involving the same type of molecule the 
propensity is equal to kx(x — 1). Propensities are the speed 
for one reaction to occur and are therefore of unit time -1 . 
The algorithm is as follow: When all reactions propensities 
are computed we compute the propensity of the system : 

a = ^2 a i 

iEReactions 

and Gillespie showed that the time for the next reaction to 
occur is exponentially distributed with this parameter a. 

Since reactions occur proportional to their rate, a sim¬ 
ple random selection biased by relative rate proportion of 
rate yields the next reaction. Once the reaction is selected, 
it is applied i.e the reactants are removed and the prod¬ 
ucts are added. This in turn modifies propensities - that 
need to be reevaluated and so forth. Note that in this al¬ 
gorithm, propensities are only calculated when reactions are 
occurred, propensities depends only on the number of the 
reactants involved in actual reactions. Only one set of reac¬ 
tants is updated at each step of the algorithm and finally it 
involves only the drawing of two random numbers: One to 
find the next reaction time and another to find the reaction 
itself. 

This description of Gillespie can be applied as this to the 
Hu AC scheme with few tweaks. First we simply add to each 
reaction a rate k r for each reaction r. 

Now the main hurdle is to obviously have a data struc¬ 
ture that contains of all the information about the atoms and 
molecules (which are the connected components). Therefore 
the first - and important - difference with the SSA scheme 
is that we need to keep track of each individual particles 
Pi = ( ti , Si) for i G [1, N] (N being the number of parti¬ 
cles) and we keep track of all the edges (link) between parti¬ 
cles £ = {(pi,pj)wherepilinkedpj}. Note that in the SSA, 


we normally keep track of the number only of each reactive 
particles. 

For a given pair (t\s) let’s set A(t,s) = {pi\U = 
t, Si = s} the set of particles of this given type and state. 
Let’s note a(t, 8) = \A(t,s)\. Also let £>(G, si, £ 2 , S 2 ) = 
{(Pi,Qj)\Pi e A(h,si),pj e A(t 2 ,s 2 ),(pi,Pj) € £} and 
finally let b(t 1: si, t 2 , s 2 ) = \&(t 1 ,s 1 ,t 2 ,s 2 )\. 

For the bimolecular reactions of the type: 


Ra: 

(tl\si 

+ (h\s 2 ) ^ ... 

Ra’: 

(*iM 

+ (ti|si) —^ ... 


the rates will be equal to 

Ra: k a («(/i,*i )«(/•_>. s 2 ) - b(ti, s 1 ,t 2 , s 2 )) 

Ra’: k a ' (a(ii,si)(a(ii,si) - 1) - b(t l ,s 1 ,ti,s l )) 

We basically need to remove atoms that are already link for 
the equation since it concerns only unlinked atoms. We also 
need to take care of the case when atoms have the same type 
and state. 

Also for a reaction of the type: 



the rate will be equal to 

Rb: kb(ti,si,t2,s 2 ) 

We compute the total propensity as the sum of all the rates of 
all reactions using the formulas above and draw our two ran¬ 
dom numbers. The first one r G [0,1] uniformly to compute 
the next time r — — log(r)/a and the other s for determin¬ 
ing the reaction i such that: 

53 ai < s < 

kEreaction,k<i kEreaction,k<i-\-l 

Once a reaction is selected we apply the modification to a 
given pair (selected at random uniformly). Due to the SSA 
scheme only one reaction is applied between each step so 
only one update to a maximum. 

Properties 

Akin to a more realistic chemistry, all reactions now have 
rate. This simulates a well mixed 3D reactor without 
any chirality issue that was only related to 2D. Almost all 
the HuAC properties are conserved in particular complex 
graphs. We can mimic diffusion to a certain point by modi¬ 
fying the actual rate of bimolecular reactions. Indeed, since 
Gillespie is the limit at infinite diffusion but reaction rate 
can be modified by diffusion using Smoluchowski equation 
(Szabo (1989)): 


n being the thermodynamic rate (when D oo). 

We obtain an extremely fast computation where the only 
simulated moments are whenever a reaction occurs. We also 
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update the data structure that keeps track of the graph for 
only one pair at a time. The addition allows to simulate also 
a wider variety of reactions. Indeed, several reactions can be 
added to the simulator to allow even more realism. Equation 

k h 

such as production 0 — 4 (t\s), degradation (t\s) — 4 0 

k 

and ’auto-conformation’ (t\si) (t\s 2 ) with respective 
rates as k p , kda(t , s) and k c a(t , s±) respectively. The simu¬ 
lator was written in C# and is available with MIT licence at 

https://github.com/hsoula/staarc. 

Results 

We provide in this section two examples of the output of 
the simulator: we reproduce the auto-replication scheme of 
Hutton and we create Random Chemical Worlds to study 
their properties. Because of its flexibility it allows us to test 
several possibilities in the set of equations rules and rates 
and also compare with the 2D case of Hu AC. Due to the 
local nature of the replication process, we expected it to fail 
completely in a infinite diffusion medium. 




Figure 3: Examples of replication errors A) Due to no local 
constraints and concurrent replication bimolecular reactions 
can occur from other replication process B) Due to race con¬ 
dition - conformation change - occurring too late to prevent 
another bimolecular reaction. 


Vanilla Replicator 

We submitted our simulator the restricted set of equations 
(R9-R16) described above to check how replication occurs. 
When drawn by hands the replication seems it can occur 
flawlessly as shown on Fig. 2 but inspection of equation 
R13 shows a race condition. For the replication to continue 
atoms <26 must encounter (any) atome x3 (here e3). In a 2D 
and diffusion limited environment, the closest possible atom 
would be the e3 but it could theoretically be any atom from a 
concurrent replication elsewhere (see Fig.3A). This feature 
happened in the original Hutton’s simulation occasionally 
and as this is the case here does not impact the stability of 
the replication it only changes the sequence in the replicated 
molecules. This deviation in replication is expected because 
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Figure 4: Number of connected components - molecules - 
at the end of the simulation of the replicator R9-R16. Two 
parameters were varied; first the rate of conformation ver¬ 
sus bimolecular reaction (rate ratio) to simulate the impact 
of diffusion. The other parameter is the number of parti¬ 
cles. The simulator was tested with an initial replicator seed: 
e8al61cldl/l of length 6 in addition with a number xO 
with x G {a, 6, c, d, e, /} varying from 60 to 6000 (from 
5 to 1000 for each type). Results are displayed as mean 
± standard-deviation (computed on 20 runs). For a perfect 
replication, molecules size average should be 6 - plotted as 
a dash line to guide the eye. A small jitter has been added 
on x values for clarity. 


local interaction and spatial correlation can strongly modify 
bimolecular reactions either transiently (Van et al. (2014)) or 
at equilibrium (Care and Soula (2011, 2013)) by modifying 
encounter probabilities. 

In addition, there is another race condition on reaction 
R13. If R3 does not occur fast enough the a3 molecule can 
become linked with another x6 atom and turned into an a2 
blocking the replication (see Fig.3B). This second race con¬ 
dition occurs because in the original scheme conformation 
reaction where instantaneous which is not the case anymore. 

As we mentioned, this replication scheme is highly local 
and space dependent and can probably fail when confronted 
to well stirred and infinite crowding medium. In order to test 
the resilience of the replication, we created a reactor with an 
initial molecule e8alblcldl/l of size 6 and provided the 
medium initially with N of each type at state 0. This should 
start the replication process. All conformation reactions had 
given rate k and bimolecular reaction of rate Xk (with A < 1 
the rate ratio to take into account diffusion). 

We computed the average size of the molecules at the end 
of the process: when no reactions can occur i.e when a = 0. 
The results are displayed on Fig.4 with mean db standard 
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Figure 5: Results of long replication. Left: Average 
molecules size throughout time. Right: Standard-deviation 
of molecules size. 


deviation (on 20 runs) for various N (so 67V total atoms) 
and various A. 

Normally when replication is occurring correctly the av¬ 
erage size of the molecules at the end of the process should 
be 6 (dashed line displayed to guide the eye). For low val¬ 
ues of AT everything works correctly - except if A is close 
to 1 and race conditions occur frequently stopping the repli¬ 
cation earlier. This race condition ’error’ occurs more and 
more frequently with increasing N because bimolecular re¬ 
actions occurs more frequently with increasing concentra¬ 
tions. When A is very small however most errors come 
from intertwined replication creating either big molecules 
(TV = 100 and A = 10 -6 ) or the minimal replicator eSf 1 of 
size 2 (AT = 100 and A = 10 -6 ). Minimal replicators are 
the only stable replication seed that will ignore other replica¬ 
tion interferences. In Hutton’s original papers, he performed 
environmental ’wash’ by setting all atoms to a zero state and 
unlinks them in a quadrant of the environment. He showed 
that the minimal replicator was ’selected’ by this wash. We 
show here that it is enough to have a high population and 
ideal mixing to achieve the same result. 

Long Replicator 

We tested a longer episode of replications with initially 
10 replicator seed e8alblcldlfl (of length 6) and no 
other particules. We’ve added the reactions of produc¬ 
tion/degradation for particles xO: 

R17: (x|0) 0 
R18: 0 (x|0) 

and let the system go for 450,000 reactions to occur (around 
1.210 9 time steps). Here conformational rate were equal to 
1 and bimolecular reaction’s rate was 10 -4 . Due to both 
degradation and production (note that once bound, an atom 
does not degrade anymore), the system slowly feeds atoms 
to the various replications seed. Since A is low, most ’errors’ 
are from entwined replications that create bigger and bigger 
molecules as seen on Fig.5. The average size increases at the 
start of the experiment and slowly settled to the actual size 
of the initial replication seeds: 6 (a dashed line is displayed 


to guide the eye; another line is for the size 2 of the minimal 
replicator). Since the number of particle grows through time 
averages are deceiving. Also on Fig.5, the standard devia¬ 
tion of the molecules size is displayed and showed that, at 
times, some molecules becomes extremely big compounds. 
As long as the replication process goes on they decreases to 
smaller size via separation. This alternation is particularly 
illustrated on Fig.6 which shows the number of replication 
according to time. High variation corresponds to big com¬ 
pounds slowly building - few replications per time unit - 
followed by quick disintegrations. 

These results suggest that, when fed with atoms in a 
steady state manner, the replication scheme is fairly stable 
and in the end produces molecules whose sizes are the same 
as the initial replication seed. 



Figure 6: Number of replication through time (MC time) for 
the long replication events. Replication are obtained when 
rule R16 happened with an atom of type ’e’. Alternation of 
periods between extreme and slow replications are clearly 
visible. 


Random Chemical World 

The simulator allows us to compute several thousands of 
atoms and thousands of reactions. It is therefore possible 
to test use-cases where the reactions are drawn and chosen 
randomly and vary the amount of reactions available. Let 
the set of available set be {a, 6, c} and the maximum state 
being 5. We can compute all the possible rules in reactions, 
conformations, with no production nor degradation. We start 
with AT particles ( t\s ) with t G {a, 6, c} and 0 < s < 4 and 
p G [0,1] describe the fraction of the reactions kept. 

We simulate for a maximum of 2,000 reactions. We chose 
to have a rate of 1 time -1 for conformation reactions and 
l -2 tirae -1 r collision reactions. We simulated 20 different 
AC’s for increasing fraction of p simulating the first 2,000 
reactions for AT = 10, 000 initial atoms. In addition, we 
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provided larger molecule by adding random links to atoms 
(with increasing probability). We used a very simple metric 
by computing the ratio of the number of molecules between 
the start and the end of the experiment. These results are dis¬ 
played on Fig. 7. When the number of chemical reactions 
available is low, the AC does not modify the structure of the 
particles graphs - even when it is already structured (orange 
star). Whereas a big number of reactions yields standardised 
particles graph with very low variability among chemistries. 
The interesting behaviour occurs at the transition where vari¬ 
ability is at the highest between 10 -4 and 10 -3 fraction of 
possible reactions. 


i ^ t* 
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Figure 7: Results on the ratio of the number of connected 
components in the particles graph between the start and the 
end of the simulation as a function of the fraction (in log) 
of the reactions. Values are mean d= standard deviation. A 
small jitter has been added on x values for clarity. 

The same results hold when looking at the dynamics i.e 
the actual time (in time steps) it took to complete the 2,000 
reactions (see Fig 8). Conformation are the fastest reactions 
but must occur when molecules are already formed. There¬ 
fore interesting situation occurs when there is a mix of bi- 
molecular and conformation to keep the system going. 

Discussion 

Artificial chemistries are extremely useful to understand 
and develop artificial life simulations. It will prove to 
be undoubtedly interesting in the future in the context of 
metabolic networks - either for theoretical considerations 
or for artificial recreation of entire cells. In this context, 
intricate and complex chemistries will be needed to create 
complex and open-ended simulated environment that could 
yield non trivial and emergent properties. 

Among complex chemistries, we used the Hutton Artifi¬ 
cial chemistry (HuAC) that, while very general, can gener¬ 
ate complex chemistries by acting on nodes on molecules 


Figure 8: Total time (in log) at the end of the simulation 
(2,000 reactions) as a function of the fraction (in log) of the 
numbers of reactions. Values are mean d= standard deviation. 
A small jitter has been added on x values for clarity. 


considered as connected graphs. The drawbacks of HuAC 
is linked to its spatial nature dependency - both in compu¬ 
tational power and the unrealistic dependence on 2D con¬ 
straints to function. 

We provided here a Stochastic Simulation Algorithm that 
recreates HuAC in a fully stirred 3D medium. HuAC 
has problems of being locally constrained and had spatial 
correlation dependencies that do not exist in well- stirred 
medium. Even though we have argued and showed else¬ 
where that spatial correlations should not be ignored in the 
context of real cell signalling we contend that the loss of 
spatial correlation in the context of HuAC is a net gain due 
to its strongly 2D dependence. 

We tested the impact on the Hutton’s original replica¬ 
tor. We show that simple diffusion in a high concentra¬ 
tion medium achieves the same result as his environmental 
’wash’. We showed also that race conditions between reac¬ 
tions can stop the replication process altogether. 

Our simulator allows to perform long simulation and we 
could test the replication process to its limit. By adding 
atoms stochastically, we showed that the result replication 
process was able to stabilise and provide correct replica¬ 
tion size (on average) provided we waited long enough for 
episode of enormous compound to subside. 

We finally tried to build Random Chemical World and 
see some of their simple properties when the reactions are 
chosen randomly. However this work is preliminary and 
random selection of reaction will not be susceptible to pro¬ 
vide complex behaviours. A higher number of reaction that 
are linked together (e.g by being a chain between reactants 
and products) will probably yield more reactions time and 
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more molecules graphs modifications. In addition, the start¬ 
ing ’soup’ of randomly connected atoms is also unlikely to 
provide interesting situations and we should investigate the 
impact on a initial set of bigger molecules. Not surprisingly, 
both these requirements are indeed met in the replicator sit¬ 
uation. Therefore future works should include the study of 
random ’graphs’ of reaction instead of random reactions. 

The simulator, which that we called STAARC, can be 
used to scale up to very general and procedural chemistry. 
Scale up in terms of size and time but also in the number of 
reactions that can be handled. Our hope here is that it will 
prove to be well suited for study of artificial evolution. 

Acknowledgments. I would like to thank V. Liard, D. Par¬ 
sons and C. Lothe of the INRIA Team Beagle for their help. 
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Many of the chemistries studied by chemists as being rele¬ 
vant to the origins of life produce a combinatorial explosion 
of products. Such chemistries include Miller-Urey chem¬ 
istry, HCN polymerisation, Fischer-Tropsch synthesis and 
the Formose reaction, to name just a few. As well as produc¬ 
ing a huge variety of products, these reactions can produce 
some of the basic molecules that comprise living systems, 
such as amino acids or sugars. The combinatorial explosions 
occur because the basic building blocks of organic chemistry 
can be put together in a huge variety of ways; the number of 
possible molecules that can be made in these systems grows 
exponentially (or faster) with their size. Moreover, these net¬ 
works do not have obvious symmetries or other forms of 
structure that would make them easy to model and analyse. 

Researchers in the origins of life have traditionally re¬ 
garded such messy combinatorial explosions as a problem 
that needs to be overcome (e.g. Schuster, 2000). However, 
technology is now progressing to the point where we can ex¬ 
plicitly map out the products produced by these chemistries 
and the networks of reactions that lead to them (e.g. Ander¬ 
sen et al., 2013). The emerging field of systems chemistry 
has begun to shift the focus away from static descriptions 
of complex chemical systems and towards an understanding 
of their dynamics. With this increased interest comes the 
realisation that the products of messy prebiotic chemistries 
are not necessarily just an inert “tar” but may be extremely 
complex dynamic systems in their own right, which we have 
simply lacked the right tools to study up to now. 

This raises a number of important questions regarding the 
dynamics of such complex, messy chemistries. One such 
question is whether a sufficiently large and complex reaction 
network can behave fundamentally differently from what is 
possible in smaller, cleaner networks. Here I demonstrate an 
example of this, with the aid of a simple toy example. The 
example shows the existence of a phase transition, leading 
to a threshold effect of a kind that cannot occur in a small 
reaction network. This threshold is closely related to Eigen’s 
error threshold (Eigen, 1971). The difference is that while 
Eigen’s threshold occurs in a “clean” chemistry (template 
replication) with plenty of symmetry, ours occurs in a model 


designed to represent a messy chemistry with little structure. 

Let us imagine that that there are N chemical species, 
each of which can react to form some of the other species 
from some surrounding milieu. The probability that species 
i causes the formation of species j is assumed to be constant 
(with value p ) and independent of i and j. 

This is essentially the same as Kauffman’s (1986) model 
of the formation of autocatalytic sets, except that we have 
simplified it further to remove the specific structure asso¬ 
ciated with cleavage and ligation of peptides, since we are 
interested in a much broader class of chemistries. We in¬ 
terpret our model in terms of small-molecule organic chem¬ 
istry rather than peptides, as described in (Virgo and Gut- 
tenberg, 2015; Virgo et al., 2016). As in Kauffman’s model, 
this model has a percolation transition, meaning that in the 
large -N limit, if p > 1/N there will be a single giant auto¬ 
catalytic set consisting of most of the species in the system. 
In the simulations below we set the kinetic rates of all the 
catalysis reactions to be equal, with the value k , but one can 
show analytically that this makes no qualitative difference 
to the results of the model, as long as the kinetic rates are 
chosen from a distribution with bounded variance. 

We are interested in the case where p is above the percola¬ 
tion threshold. In this regime, if any amount of any species is 
added to the system, it will eventually produce some amount 
of almost every species. We are using this simple model to 
stand in for a much more complex autocatalytic system with 
a combinatorial explosion. As the amount of matter in the 
system grows, it samples more and more of the combinato- 
rially huge space of species available to it. 

Let us now suppose that within this giant autocatalytic set 
there exists a smaller subset that can collectively produce its 
own members at a greater rate. For ease of exposition, we 
take this set to be a single species that catalyses its own pro¬ 
duction at a rate &*, although similar results follow in the 
more general case. We can ask the following question: can 
this small subset produce itself faster than the “messy” au¬ 
tocatalytic set that contains it? If so, then if we were to per¬ 
form this reaction experimentally, we might find that once 
the “fast” set was discovered it would come to dominate 
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Figure 1: The relative concentration of the self-replicating 
molecule, plotted as a function of its replication rate k *. For 
this plot the values N = 4000 and p = h/N were used. 
With a greater value of N the threshold becomes sharper. 



Figure 2: Relative concentration profiles for various values 
of the replicator’s growth rate k*. In each plot, the species 
are enumerated along the x axis, with the relative concentra¬ 
tion shown on the y axis. The self-replicating species is in 
the centre of the x axis. Note the stark difference between 
plots (a) and ( b ) with k* below the threshold, versus (c) and 
(d) above it. 


the composition, so that mass spectroscopy would reveal a 
highly peaked concentration profile, rather than a heteroge¬ 
neous sampling of all possible products. 

Figure 1 shows the relative concentration of a rapidly self- 
replicating species as a function of its replication rate. (That 
is, we normalise the vector of concentrations by its sum.) 
The threshold occurs when k * = kp/N , with k being the 
rate at which the other species catalyse one another’s forma¬ 
tion. Below the threshold, all species grow at similar rates. 
Increasing k* leads to only a slight relative increase in the 
amount of the replicator molecule and no change in the rela¬ 
tive concentrations of the other species. However, above the 
threshold the concentration profile changes, becoming dom¬ 
inated by the replicator, and to a lesser extent, by the species 
whose production can be directly catalysed by it, as shown 
in figure 2. (This is closely analogous to the “mutational 
halo” in the case of Eigen’s error threshold.) 


The significance of this result lies not in the naive scenario 
presented, but in showing that such threshold phenomena 
can exist in messy chemistries at all. Previously two distinct 
phase transitions were known in prebiotic chemistry models. 
The error threshold is a phase transition in a chemistry, but 
not a messy one. Kauffman’s model exhibits a phase transi¬ 
tion in a messy chemistry, but it is in the opposite direction 
from ours (i.e. from clean to messy) since once the threshold 
is passed the system explodes out into the whole combina¬ 
torial space. This new model shows a phase transition in 
which an initially messy chemistry can spontaneously con¬ 
strain itself to a specific part of its accessible phase space. 

We suspect that many more threshold phenomena of this 
nature will be found, in models much closer to chemical re¬ 
ality than the toy one we have presented. Indeed we suspect 
that the phase “more is different” (Anderson, 1972) applies 
to the size of reaction networks as much as it does to the size 
of a physical system, with phase transitions being a com¬ 
mon and generic phenomenon in the dynamics of complex 
chemistries. Perhaps in time we will find transitions that 
lead not just to replication of a fixed set of species but to 
a complex, metabolism-like network of molecular interac¬ 
tions, or to replication with heritable variation, and thence 
to natural selection. 
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Abstract 

We present a subsymbolic Artificial Chemistry (ssAChem) in 
which all properties relevant to bonding are emergent from 
the underlying dynamical system (an RBN). We explore this 
ssAChem by evolving a seed set of atomic particles and show¬ 
ing the type of composite particles the system can produce. 

INTRODUCTION 

The field of Artificial Chemistry (AChem) has produced nu¬ 
merous models of chemical systems over the years. These 
models often have a symbolic representation of atoms or 
molecules, a defined set of possible reactions, and rely on 
some form of environment such as a 2D lattice or a well 
mixed reactor (Dittrich et al., 2001). 

In these approaches bonding occurs via explicitly defined 
reaction rules through a grammar-like notation. A reaction 
results in a new symbol replacing one in the reactor (Mc- 
Mullin, 1997; Varela et al., 1974; Ono and Ikegami, 2001; 
Madina et al., 2003) These systems are not always mass 
conserving. In mass-conserving systems bonds form links 
between two particles (Hutton, 2002, 2005, 2007) In both 
cases reaction rules are explicitly defined aiming to explore 
specific behaviours. Another approach (Fontana and Buss, 
1994; Dittrich and Banzhaf, 1998; Banzhaf et al., 1999; 
Hickinbotham et al., 2010) uses one reactant as an operator 
and another one as an operand with the result being the re¬ 
action product. In such systems reactions are emergent from 
the properties of the particles themselves. 

In subsymbolic Artificial Chemistries (ssAChems) 
(Faulconbridge et al., 2010, 2011; Faulconbridge, 2011) 
particles are defined as systems with internal proper¬ 
ties. Bond formation is a result of these interacting in 
a predefined way. Reactions are an emergent property 
of interaction. Overall the subsymbolic system can be 
seen as being less complicated, since it does not define 
individual behaviours like most symbolic approaches do. 
However, because bonding properties are now emergent and 
the reaction algorithm universal for all possible particles, 
the system is significantly more expressive with a huge 
combinatorial search space of possible reactions. 


Random Boolean Networks (RBNs) (Kauffman, 1969; 
Kauffman et al., 2003) are an attractive dynamic system on 
which to base ssAChems. They provide a large number of 
exploitable properties, are computationally inexpensive and 
can easily be combined to produce analogues to molecular 
structures. RBN-World (Faulconbridge et al., 2011) itself, 
while having emergent bonding properties, still has many 
of its properties externally defined. Here we introduce the 
Spiky RBN model, and a simple reaction mechanism which 
allows for bond formation and decomposition. Like in RBN- 
World we use an RBN as the subsymbolic system. However, 
our mapping of subsymbolic to atomic particle properties 
is significantly different. Our aim is to build an ssAChem 
where all properties relevant to bonding are fully emergent 
from the underlying structure and dynamics. 

The Spiky RBN Model 

Here we first introduce the Spiky RBN, which defines our 
atomic particles and their properties. Next we give the col¬ 
lision and stability criteria, which determine if a link can be 
formed and if a link is stable. Then we explain the linking 
mechanism that describes what a link consists of and how 
to form one. To emphasise that we are not simulating real 
atoms or chemicals, we use the following terminology. Par¬ 
ticles are connected by links. An atomic particle is a particle 
with no links (in our model it is a single Spiky RBN); no 
operations within our system can break down an atomic par¬ 
ticle. A composite particle consists of two or more atomic 
particles connected via link. References to a particle mean 
either an atomic or a composite particle. 

The boolean networks are random since our interest is in 
emergence of properties and dynamics as oppose to engi¬ 
neering of properties towards specific dynamic behaviour. 
Engineering specific boolean networks would limit the sys¬ 
tems dynamics to only those that we have encoded. 

Atomic Particle 

An atomic particle is a small RBN, and its properties emerge 
from the dynamics. Particles are linked to form larger RBNs, 
with their own emergent dynamics. In the original RBN- 
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Figure 1: Diagram of the Spiky RBN. Nodes are separated 
into Interaction Lists (IL) of varying size based on Algo¬ 
rithm 1 . Edges in solid lines are those which are part of the 
IL and can be used in linking. Dashed line edges are part 
of the RBN topology that cannot be changed due to linking. 
Note that there is an IL with only one node and no edges; it 
cannot form a link. Each IL spike can be seen, with colour 
denoting sign and size denoting magnitude. 


world, particles had two special ‘bonding nodes’ added ar¬ 
bitrarily; in our Spiky RBN model, the number, location, 
and properties of these nodes are emergent. 

The core of the model is the subsymbolic representation 
of the atomic particle. The RBN is split into Interaction Lists 
(IL) made up of RBN nodes as shown in Fig. 1. Each IL 
is a list of nodes where each subsequent node takes direct 
input from the previous node in the list. The first node is 
the only one with unspecified inputs. ILs are constructed by 
following connected nodes where the next node is chosen 
based on the number of its outgoing edges, as defined in 
Algorithm 1 . We attempted a few methods of picking first 
and subsequent nodes least influential gave a good balance 
between number of nodes per IL and number of ILs. 

The ILs partition the nodes: every node belongs to exactly 
one IL, and every IL has at least one node in it. ILs have no 
effect on the topology of the RBN; they are a logical group¬ 
ing. 

ILs form the basis for interaction between particles in the 
Spiky RBN model, replacing the arbitrary binding site in 
RBN-World (Faulconbridge et al., 2011). The size of the IL 
(the number of nodes it contains) and the number of ILs in a 
particle are all derived purely from the topology of the RBN. 
Each IL has a numerical property referred to as the spike 
(Fig. 1), which determines if a link will form and remain 
stable. The spike value is calculated over the attractor cycle 


Data: TV: list of all nodes in the RBN ordered by least 
influential first 
while TV is not empty do 

Remove first node n from TV; 

Create new Interaction List ILp, 

Add n to ILi ; 

while 3 n' E TV where n is an input to n' do 
Remove n' from TV; 

Add n' to ILp, 
n E- n'\ 

end 

i++ ; 

end 

Algorithm 1: Building Interaction Lists 


as follows. 

The value of node x at RBN state s , x Svalue is 1 if it is in 
a ‘true’ state and —1 if it is in a ‘false’ state. 


x 


,lue 


1, if x s , . = T 

‘ Estate 

— 1, if x s . . = F 

‘ Estate 


( 1 ) 


The value of node x over one cycle of the attractor x va i ue 


is 


%value 


s=t-\-c 



s=t 


( 2 ) 


where t is the first state of the attractor and c is the attractor 
length. 

The spike for ILi of particle A, Sai, is the sum of all node 
values for nodes in that IL. 


$A1 — ^ ^ %value ( 2 ) 

ccE/Li 

This gives us a spike with both a magnitude and a sign. 
It is constrained by the attractor length c and the number of 
nodes in I Lai : 

-^■-L stze 


~ IL Ai size c < S A i < IL Alsize c (4) 

We calculate the attractor of a RBN from an initial state 
of all ‘false’. 

An atomic particle has three properties: the number of 
ILs, the size of each IL, and the spike of each IL. The first 
two are a function of the RBN topology and the third is a 
function of the RBN dynamics. Because all are deterministi¬ 
cally calculated, two identical RBNs will produce two iden¬ 
tical atomic particles with identical behaviours. The number 
of ILs gives the maximum number of links that a particle 
can form. The size of each IL determines if it can be part of 
a link and how severely a link will change the topology of 
the particle. The spike dictates which specific set of bonds 
is possible. 
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Collision and Stability Criteria 

The second component of the model are the collision and 
stability criteria. The collision criterion dictates what must 
be true in order for a link to form. The stability criterion 
dictates what must be true in order for a link to continue to 
exist. 

A link can form between ILs chosen from two different 
particles. The ILs must have size > 1 in order for a link to be 
possible. How particles and ILs are chosen depends on the 
reactor type. In an aspatial well mixed reactor two random 
particles can be chosen and a random IL on each. In a 2D 
lattice reactor particles in adjacent sites and the nearest ILs 
can be chosen. 

The collision criterion states that: 

SiA + S jB = o (5) 

where SiA is the spike of the i th IL of particle A. If the col¬ 
lision criterion is not met, the collision is considered elastic 
and the two particles do not form a link. If the collision 
criterion is met then a link forms as described below. Link 
formation results in a change in RBN topology and conse¬ 
quently in a possible change to partial linking properties. 
After the bond construction all partial linking properties are 
recalculated and used to check against the stability criterion. 

Like collision criterion, the stability criterion states that: 

S'iA + S'j B = 0 ( 6 ) 

where S[ A is the spike of the i th IL of particle A after the 
bond has been formed. The stability criterion is checked not 
only for the newly formed link, but for every pair of ILs that 
are part of a link in the new composite particle. Decomposi¬ 
tion results in a particle splitting into two or more fragments. 
Each IL that was part of a now broken link is free again. This 
means the topology has changed such that stability criterion 
is checked recursively until all links in all fragments meet 
the criterion. Because the stability criterion is checked for 
all links it holds true for every composite particle that is not 
currently attempting to link. 

Link Structure and Formation 

If the collision criterion holds then a link is formed. This 
is done by swapping pairs of nodes inputs between the two 
ILs as shown in Fig. 2. Edges which are part of the IL 
are swapped, starting with the edge outputting from the first 
node in the IL. 

The maximum number of swaps possible is n — 1, where 
n is the size of the smaller of the two ILs. 

A link can be constructed between any two IL of size > 1. 
ILs of size 1 cannot link because they have no edges to swap. 
(It is possible to have a node that takes input from itself. We 
do not consider these.) 

After link formation the spikes of all ILs in the new com¬ 
posite particle are recalculated, since link formation results 



Figure 2: An example of the structure of a link between 
particles A and B. Note that two pairs of edges are swapped 
since ILba has a size of 3. 

in a change in the underlying RBN topology. For any links 
that do not meet the stability criterion the link is decomposed 
by reversing the input swaps. When a link decomposes the 
break results in two new particles; again the spikes of the 
ILs are recalculated, and any further decomposition needed 
is performed. This process continues until the products are 
stable (meet the stability criterion), hence a single interac¬ 
tion between two reactants can result in multiple product 
particles. The algorithm for two atomic particles is shown 
in fig.3. 

Our links have a richer structure than those in the original 
RBN-World (Faulconbridge et al., 2011). Small ILs mean 
fewer swaps to form a link, resulting in less perturbation 
to the linking particles. This implies a higher chance that 
the spikes do not change and the link is stable. Larger ILs 
produce a larger change in topology and are therefore more 
likely to result in different spikes and so bond instability. 

Experiment: Growing a Seed Set 

The aim of the experiment is to use an evolutionary approach 
to generate interesting seed sets of atomic particles. One of 
the long term goals of the project is to add further parame¬ 
ters such as kinetics, variable link strengths, and geometry 
to the model, in order to see the effects these properties have 
on the dynamics of the system. In order to understand and 
compare the effects of these parameters we need an exem¬ 
plar dynamic system: a set of atomic particles with nontriv¬ 
ial dynamic properties. Finding such a set can be difficult 
since the search space is vast and there is no way to predict 
dynamic behaviour without simulation. We can state unde¬ 
sirable characteristics: 
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Figure 3: Reaction algorithm between two atomic particles. 
First an IL is selected on each. The collision criterion is 
checked. If it passes then a link is formed (product particle 
AB). The stability criterion is checked for the ILs that are 
part of the link. If it fails then the link decomposes. Note that 
ILs not involved in links may have had their spikes changed 
due the effect of the link on the new composite RBN. 


• Overly restrictive : most reactions are elastic and do not 
result in larger composite particles. The system quickly 
reaches a stable state with no reactions occurring. 

• Overly permissive : almost all reactions result in stable 
links. The system quickly congeals into a single large 
composite particle. 

• Chaotic : almost all reactions between composite particles 
result in decomposition of links. The system is reactive 
but larger composites are quickly destroyed. 

Reactor 

For this experiment we use an aspatial reactor initialised 
with 20 unique atomic particles. The reactor attempts 1,000 
links by picking two particles at random, picking an IL on 
each at random, and attempting a link. If the reaction is suc¬ 
cessful all reactants and products are added to the reactor. 

At any time the reactor contains one copy of each com¬ 
posite particle that has been generated so far, plus the initial 
20 atomic particles. In effect our system is a well stirred 
reactor with equal concentrations of all particles. This is a 
rough exploration of the possible behaviour the seed set can 
produce. 

Fitness Metrics 

We use a fitness function to describe the type of system we 
are looking for. We calculate a fitness per seed set in the 
population, as well as a fitness per atomic particle. There 
are five measures on which we base our fitness function: 

C The number of unique composite particles that the sys¬ 
tem creates after a set number of reaction attempts. 

V Variance in observed composite particle size. 

L Variance in number of links per atomic particle in a com¬ 
posite particle. 

R Percentage of attempted reactions for which the new 
bond is stable. 

P Number of unique links that an atomic particle has been 
observed as forming. For example if atomic particle A 
has only ever formed bonds with atomic particle B and 
itself, then P — 2. 


These characteristics form the basis of our fitness func¬ 
tion. C, V , L and R provide an overall fitness for our seed 
set. P provides an individual fitness for each atomic particle 
within the seed set. We use a rank based approach, which 
removes the need to provide weights for the components of 
the fitness function. 
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Fitness Functions 

The fitness of reactor i is: 


fri = Rank(Ci) + RankfVi ) + Rank(Li) + Rank(Ri) 

(7) 

where Rank(Ci) is the rank of reactor i when the reactors 
are ordered by lowest to highest. Since our population is 
made of 20 individuals and there are four ranks, /r* is con¬ 
strained to 

4 < fn< 80 (8) 

The fitness of atomic particle j in reactor i is the number of 
unique bonds it can form, Pij 

fij = Pij (9) 

With a seed set of 20 atomic particles fy is constrained to 

0 < fij < 20 ( 10 ) 


The mutation function replaces the atomic particle j in reac¬ 
tor i with a new random one with a probability proportional 
to Mij , where 


Mij — 


84 — fri 
1 + fij 


( 11 ) 


That is, fitter reactors and fitter particles are mutated less. 


0.19 < < 80 (12) 

Exploratory Algorithm 

To generate atomic particle sets we use an algorithm sim¬ 
ilar to clonal selection (De Castro and Von Zuben, 2000, 
2002) (see Fig. 4). Our population is made of 20 reac¬ 
tors. Unlike normal clonal selection, each population mem¬ 
ber produces exactly one clone by mutating atomic particles 
based on M^. This is because our aim is not specific op¬ 
timisation but rather exploration for possible seed sets with 
favourable behaviours. In order to ensure that good seeds 
propagate through the generations we include a low 5% 
crossover chance. The crossover function replaces the three 
lowest participation particles in a set with the three high¬ 
est participation particles from another set. When crossover 
occurs we ensure that the resultant seed set has 20 unique 
atomic particles by making sure the incoming particles are 
not already in the seed set. The crossover probability is kept 
low because again we are interested in diversity. Also, due 
to the nature of the system, high fitness of a particle in one 
reactor does not necessarily imply high fitness in another re¬ 
actor. This is a desirable trait since high fitness in all reactors 
would suggest the particle is overly permissive and can bond 
with almost everything. 


Results 

The experiment was run with RBNs of K = 2,7V = 12 
forming atomic particles. We first look at the behaviour of 
the reactors over the generations and then give an example 
particle from the best reactor at the end of the run. 


Initial 



Figure 4: Exploration algorithm. The algorithm is initial¬ 
ized with 20 sets of 20 atomic particles each. Each reactor 
then attempts 1,000 reactions. We then calculate reactor fit¬ 
ness and particle fitness, mutate and perform crossover to 
get the new population for gen n + 1. This repeats for 100 
generations. 


Exploratory Algorithm Performance 

Fig. 5 shows how the values of C, V, L and R change over 
the generations. 

Over the generations there is an increase in the distribu¬ 
tion’s upper quartiles suggesting some reactors are improv¬ 
ing and finding better seed sets. 

The median values in each graph fluctuate, which is to be 
expected: even if no mutation or crossover is experienced 
there is no guarantee that a successful reactor will reproduce 
its behaviour in the following generation. Because reactions 
are randomly chosen it is possible that a very reactive com¬ 
posite particle is not generated even if one is possible. 

The median variance in size (Fig. 5b) stays very low for 
most of the experiment. This is due to reactors producing 
only composite particles of one size (most commonly size 
2) giving a variance of 0. 

Generation 68 shows a large increase in variance in parti¬ 
cle size (Fig. 5b) compared to the previous generation. For 
that generation there is also an increase in median number of 
unique particles (Fig. 5a), and number of bonds formed (Fig. 
5d) compared to the previous generation. The median vari¬ 
ance in number of bonds per particle (5c) is lower then the 
previous generation however. This suggests that gen 68 pro¬ 
duced many large composites which where mostly straight 
ribbons of particles with low branching. 

The large number of outliers shows that while these low- 
reactivity systems are common we can also find more inter¬ 
esting examples. The reduction in outlier numbers towards 
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Figure 5: Distribution of the reactor measures per generation; some outliers are omitted, (a) C, number of unique particles; (b) 
V, variance of particle size; (c) L , variance of number of links; (d) R, stability. Generation 68 is highlighted for reference. 



Figure 6: Distribution of the evolutionary activity measure QNN per generation for each reactor. 
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Reactor 

C 

v 

L 

R 

0 

46 

2.3667 

0.18255 

40 

1 

63 

4.3401 

0.19970 

59 

2 

129 

8.3091 

0.20915 

83 

3 

586 

430.3 

0.51866 

307 

4 

76 

7.6058 

0.17437 

58 

5 

229 

21.694 

0.38975 

137 

6 

59 

1.8248 

0.17602 

54 

7 

139 

6.8823 

0.27666 

116 

8 

501 

1607.6 

0.36968 

402 

9 

453 

128.54 

0.27755 

298 

10 

149 

16.221 

0.20390 

93 

11 

39 

2.5063 

0.12235 

34 

12 

566 

116.41 

0.36923 

280 

13 

274 

398.26 

0.11454 

166 

14 

608 

368.44 

0.29997 

338 

15 

266 

31.143 

0.28607 

158 

16 

183 

25.329 

0.34497 

140 

17 

58 

4.797 

0.18159 

45 

18 

179 

61.056 

0.48978 

114 

19 

50 

2.2724 

0.12439 

38 


Figure 7: Gen 99 Reactors 
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Figure 8: A composite particle produced by reactor 3, 
made of 65 atomic particles from 7 different species 
(T,K,L,S,F,M,D). 


the end of the run together with an increase in distribution 
variance suggests that fit individuals have a positive influ¬ 
ence on the population. 

In order to check if the exploratory algorithm is produc¬ 
ing positive evolutionary activity we use the QNN measure 
(Droop and Hickinbotham, 2012). Fig. 6 shows the QNN 
distribution per reactor over the generations. Large increases 
in the QNN distribution, especially towards the latter third of 
the experiment, suggest a period of strong evolutionary ac¬ 
tivity for most reactors. Large positive outliers are single 
reactors which are showing high activity, likely due to mu¬ 
tation. Again due to the random nature of the reactor we see 
fluctuations between generations. 

Example product 

Fig. 7 shows the reactor metrics for each reactor at the end 
of the last generation. In the final run four of the 20 reactors 
produced over 500 unique composite particles. Of these the 
best (based on f r ) was reactor 3 producing 586 particles of 
which 307 had newly created stable bonds. 

Fig. 8 shows one of the largest generated composite par¬ 
ticles. The main chain consists of T and K atomic particles. 
We see branching along the chain showing that the T atomic 
particle is capable of three links. Interestingly we also see 
that particle L allows other, non-K or -T particles to join 
the chain (specifically F, M and S). This gives the product 
compositional diversity. Like the T particle, L is capable 
of forming up to three links. However the reactor does not 
contain any long chain L composites, suggesting that L-L 
links are unstable. While the particle in Fig. 8 is stable there 
are still T atoms with only one link, suggesting that it could 
grow even further. 

This particle is a product of 47 unique reactions. Most 
produce exactly one new unique composite particle. How¬ 
ever three of the reactions produce two unique particles. 
This suggests that most reactions are combinatorial in na¬ 
ture. While reactions can produce multiple products, only 
previously unobserved products are recorded and added 
back to the reactor. 

Branching is common in the final reactors; 15 out of 20 re¬ 
actors have at least one composite where an atom has three 
links. However we have not observed a particle that can 
form four or more links. It is possible that changing the way 
ILs are constructed to ensure that each particle has at least 4 
linking sites of size >2 would give more branching. How¬ 
ever engineering particles in such a way is contrary to the 
core principle of having emergent properties. A more con¬ 
sistent approach would be to find naturally occurring parti¬ 
cles of that nature and introduce them into the seed set. 

Conclusions and Outlook 

We have presented a new ssAChem based on the Spiky RBN 
model of an atomic particle. The core design principle of 
the model is to derive all properties relevant to linking from 
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the emergent properties of the RBN. We have shown that 
the method is capable of producing large composite parti¬ 
cles with varying structures. Seed sets of atomic particles 
have been found which are reactive and produce a variety of 
possible composite particles, as well as a range of reaction 
paths which can be further explored. 

Overall the Spiky RBN model seems to be a viable option 
for a fully emergent ssAChem. Future work will focus on 
expanding the available mechanisms beyond simple bonding 
and stability. 

Firstly energetics will be introduced. Bonding and de¬ 
composition will depend on meeting collision and stability 
criteria as well as a probability proportional to reactor tem¬ 
perature. This could result in a relaxation of the stability 
criteria allowing for more composite species to exist for a 
short time. 

Secondly a spatial 2D reactor will be introduced. Geom¬ 
etry of composites will be determined by number of bonds 
per particle as well as angles between bonds. The values will 
again be emergent from the sRBN properties. 

As well as bonding, weaker inter-particle interactions 
could be considered. This could allow the emergence of or¬ 
ganisation within a spatial reactor. 

The sRBN model provides us with the flexibility to intro¬ 
duce the above mechanisms in a way that is fully emergent 
from the underlying organisation. 
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Abstract 

We demonstrate the emergence of spontaneous temperature 
regulation by the combined action of two sets of dissipative 
structures. Our model system comprised an incompressible, 
non-isothermal fluid in which two sets of Gray-Scott reaction 
diffusion systems were embedded. We show that with a tem¬ 
perature dependent rate constant, self-reproducing spot pat¬ 
terns are extremely sensitive to temperature variations. Fur¬ 
thermore, if only one reaction is exothermic or endother¬ 
mic while the second reaction has zero enthalpy, the system 
shows either runaway positive feedback, or the patterns in¬ 
hibit themselves. However, a symbiotic system, in which one 
of the two reactions is exothermic and the other is endother¬ 
mic, shows striking resilience to imposed temperature varia¬ 
tions. Not only does the system maintain its emergent pat¬ 
terns, but it is seen to effectively regulate its internal temper¬ 
ature, no matter whether the boundary temperature is warmer 
or cooler than optimal growth conditions. This thermal home¬ 
ostasis is a completely emergent feature. 

Introduction 

Life is a quasi-miraculous panoply of controls, feedbacks, 
interactions and diversity. Every time we think we have dis¬ 
covered its final limits, we stumble upon hidden surprises 
that propel us to once more rip up the rulebook. It remains 
arguably the greatest intellectual challenge of our time to un¬ 
derstand how evolutionary forces have picked, crafted and 
re-worked physical and chemical mechanisms such that life 
wins. What makes this quest so difficult is that we still strug¬ 
gle to define the game that life is playing, and it also seems 
that some players have found ways to modify the rules mid¬ 
play and perhaps even cheat. 

In order to break into the vault of life’s mysteries, we 
must seek plausible trajectories. Trajectories that life may 
have taken here on Earth, that it could have taken, or even 
could take in other scenarios, to get from dead chemistry to 
open-ended complexity. One of the most characteristic fea¬ 
tures of organisms is their ability to carry out regulation and 
stabilisation in the face of external change. The reactions 
and interactions that life depends on cannot take place in an 
arbitrary range of conditions; in fact many have strict limi¬ 
tations. 


The idea of homeostasis conjures ideas of regulating cell 
salinity, glucose concentration, body temperature or cell 
membrane lipid composition, to name but a few. In fact, it is 
a fundamental and necessary feature of all living organisms 
that some minimal set of internal variables are maintained 
within viable ranges. 

It is easy to imagine many forms of proto-life, early in the 
history of our planet (and perhaps also other worlds), which 
exhibited only some, not all of the characteristics of life as 
we know it today. For example, one could think of an early 
metabolic reaction set being driven by a particular geochem¬ 
ical gradient. Out of the space of possible such reaction sets, 
there are a large number which are very sensitive to the exact 
details of the local conditions. So if such conditions were to 
change, many of those possible metabolisms would cease to 
function. 

One can also imagine early forms of life that produced 
some kind of waste product that was toxic in sufficiently 
high concentrations. In this scenario early life could have 
easily poisoned itself into oblivion. This of course brings 
to mind the example of free oxygen production by photo¬ 
synthetic organisms, and the danger posed to many forms of 
life from that oxygen. 

Broadly speaking, part of the solution that life stumbled 
upon was cooperation. If there is a growing excess of a par¬ 
ticular substance and there is the possibility that a new or¬ 
ganism can somehow make use of that substance and move 
the system towards being materially cyclic, then it’s likely 
that such an arrangement will emerge and persist. To be 
thermodynamically consistent, the new organism may have 
to use a novel energy source to carry out its recycling stage, 
and there are many examples of this in both modern and past 
life. 

Once there are several ecological interactions taking place 
between a set of organisms, there will inevitably be feed¬ 
backs. Feedbacks between population sizes, concentrations 
of key chemical species, environmental variables, and many 
other factors. Despite the fact that such feedbacks did not 
arise from any intentional design process, those that per¬ 
sisted over evolutionary time now appear to be very well 
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tuned. Many negative feedbacks that we can identify appear 
to allow organisms to survive in a range of conditions, par¬ 
ticularly those that are normally unfavourable to their emer¬ 
gence or survival. We can use hindsight and place evolution¬ 
ary explanatory frameworks upon these observations, mak¬ 
ing arguments such as “groups of self-regulating organisms 
would have had a significant selective advantage over those 
groups that were unable to control key environmental pa¬ 
rameters.” However, the true chronology of life’s discovery 
of these clever techniques, is veiled behind a shadow of a 
meandering story of change, and lost historical information. 

Thus we are compelled to seek systems in which such bi¬ 
ological functions emerge spontaneously, and perhaps get a 
window on possible narratives for life’s ascension. Whether 
such sequences of events are similar to how life as we know 
it arose is very difficult, if not impossible to truly know. 
However, if we find that amongst the space of complex 
driven systems, a large number produce life-like dynamics, 
we at least can get a first foothold on the inevitability of life 
in the universe. 

Our previous research sought to elucidate conditions in 
which non-living systems express life-like characteristics 
(Bartlett, 2014; Bartlett and Bullock, 2015; Bartlett et al., 
2010), and in this work we continue in that endeavour. Hav¬ 
ing discovered systems in which non-living patterns sponta¬ 
neously compete with one another for a common free energy 
source (Bartlett and Bullock, 2015), we questioned whether 
cooperation or symbiosis, might also readily emerge in sim¬ 
ple physico-chemical systems. 

Such dynamics can indeed be observed, and in this paper 
we will explore the spontaneous emergence of a tempera¬ 
ture regulation mechanism in a shockingly simple non-living 
system. 

We will first define the system in question and summarise 
its normal characteristic features. Then the modelling 
framework used to carry out our simulations will be briefly 
described. The next sections document the phenomenology 
of thermal Gray-Scott reaction diffusion (GSRD) structures 
and their resilience (or sensitivity) to temperature changes. 
We then illustrate a robust thermal homeostasis mechanism 
that emerges from the combined action of two sets of GSRD 
systems, before drawing conclusions in the final section. 

The Quintessential Pattern-Former 

The original GSRD system consists of two chemical species 
A and B , which are free to diffuse and react within a two- 
dimensional domain (Gray and Scott, 1985,1994; Lee et al., 
1994; Maharaetal., 2008; Pearson, 1993; Virgo, 2011). 
Species A is fed into the system at all points via porous 
walls, at a rate equal to F(1 — ip a), where F is a constant 
and i/ja is the concentration of A at that location. There is 
a non-linear autocatalytic reaction between the two species: 
A+2B —3 B. Substance B is removed at a rate (F+R)'ipB, 
where R is a positive parameter (removal rate) which speci¬ 


fies the rate of removal of substance B (over and above the 
feed rate F). 

Despite its bare simplicity, this system is capable of ex¬ 
hibiting a myriad of dynamic chemical structures. Of course 
there are two trivial, non-structured attractors as well: one in 
which the reaction rate drops to 0 because B has all but dis¬ 
appeared from the system (perhaps if the reaction rate was 
too low to keep pace with the rate of removal of B and hence 
'ipA 1 due to the boundary conditions), and one in which 
the reaction occurs at a constant rate homogeneously across 
the domain and both species exist at finite concentrations 
which do not vary with space or time. 

Nonetheless, under a small but finite range of conditions, 
and when the system is initialised in the correct way, a broad 
range of stable structures can be observed (structures in the 
concentration fields of the two species). These patterns are 
the emergent result of a frustration between the inward dif¬ 
fusion and supply of A, the transformation of A into B , and 
the outward diffusion and ubiquitous removal of B. If there 
is a point at which ^a is relatively low, is relatively high, 
and the reaction turns A into B faster than B is lost from that 
region, then a small structure, or soliton, can persist. There 
are patterns of many different morphologies, also emerg¬ 
ing from a similar balance of physical effects. In this paper 
we are interested purely in the well-known, self-reproducing 
spot patterns (Lee et al., 1994; Pearson, 1993; Virgo, 2011). 

Traditionally, these systems are modelled under an 
isothermal assumption, so temperature plays no role in their 
dynamics. However the role of temperature in chemical ki¬ 
netics could not be more essential. In recent years we have 
carried out a range of studies on the effects of adding thermal 
dependencies and interactions to GSRD systems (Bartlett, 
2014). 

In the investigation documented here, the system in ques¬ 
tion was a duo of GSRD systems. Having observed a com¬ 
petitive dynamic between GSRD spots and convection cells 
in our previous work (Bartlett and Bullock, 2015), we in¬ 
tended to go further and explore the possibility of coop¬ 
eration between dissipative structures. Thus in this paper, 
there are two GSRD systems placed within the same do¬ 
main. They are dissolved in a solvent fluid which flows 
and advects any passive scalars according to conventional 
incompressible flow dynamics. The temperature is of course 
also free to vary as a function of space and time. We sim¬ 
ply need to control the temperature at the upper and lower 
boundaries of the system. Chemical reactions can also re¬ 
lease or absorb a certain amount of heat (per unit reaction 
rate), quantified by the enthalpy change AH (negative for 
exothermic reactions and vice versa). 

The next section will describe the numerical algorithm 
that allowed us to re-create these model systems in silico. 


609 


Modelling Framework 

The simulation of non-isothermal fluids advecting chemi¬ 
cal species, which themselves are undergoing temperature- 
sensitive reactions is not a lightweight undertaking. While 
various ‘traditional’ numerical methods (e.g. methods based 
on numerical integration or discretisation, such as multi¬ 
physics solvers like COMSOL) exist for the solution of the 
governing equations of these systems, they come with a sig¬ 
nificant computational burden and tend to lack transparency. 
Since we were not concerned with absolute predictive accu¬ 
racy, but instead with carrying out “opaque thought experi¬ 
ments” (Di Paolo et al., 2000), we made use of an extended 
Reactive Thermal Lattice Boltzmann Model RTLBM, devel¬ 
oped by Bartlett (2014). 

The basic LBM has been extensively used and evalu¬ 
ated for a variety of fluid flows, and its accuracy and ef¬ 
ficiency for simple flows is now well established (see e.g. 
Wolf-Gladrow, 2000; Chen and Doolen, 1998, and refer¬ 
ences therein). The LBM has also been extended to in¬ 
clude buoyancy driven convection of non-isothermal fluids, 
and its effectiveness for such flows is also well established 
(He et al., 1998; Shan, 1997; Peng et al., 2003). 

LBMs capable of simulating reactive flows were less 
common until recent years, but these models are now 
steadily reaching maturity (Ayodele et al., 2011, 2013; 
Frouzakis, 2011). The specification, development and test¬ 
ing of the exact RTLBM used in this paper is described fully 
in Bartlett (2014); Bartlett and Bullock (2015), and hence 
we will not go into any further modelling details here. 

Instead we will simply re-state the governing equations of 
the system in question. The flow field obeys the standard 
Navier-Stokes equations for an incompressible fluid that 
can experience buoyancy forces (under the Boussinesq ap¬ 
proximation). The temperature field follows the advection- 
diffusion equation of a passive scalar (with sources and sinks 
from chemical reactions and boundary conditions): 


dT 

~9t 


= X V 2 r - V • (u T) 


-Ae (1) 

Finally, the four chemical species concentration fields obey 
reaction-diffusion-convection equations: 
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dt 
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( 3 ) 


where the diffusion coefficients are Da x = 2 Db x — Da 2 = 
2 Db 2 , the supply and removal parameters are fixed at F = 
0.03 and R = 0.061 and u is the local fluid velocity. Note 
that in contrast to the standard GSRD system, these equa¬ 
tions have temperature dependent rates (incorporating the 
Arrhenius equation), which all share the same activation en¬ 
ergy E a = 1.7 and frequency factor A = 3.1. 

Fragility 

We are now in a position to explore the stability of a broad 
class of chemical patterns. Of course sustained structure for¬ 
mation on its own is grossly insufficient as an analogue for 
life. Many chemical systems show spatial structure (indeed 
the first that come to mind are the microscopic crystalline 
structures of many solid materials). However, if that struc¬ 
ture is purely a function of external conditions, and has no 
means to actively respond to them, then it certainly cannot 
be considered life-like. 

In this investigation we considered two GSRD systems 
in the same domain (they could only interact with each 
other implicitly through their thermal influence, there was 
no cross-diffusion or cross-reactions). We will begin at this 
stage with the assumption that the two reactions are neither 
exo- nor endothermic (they are thermally neutral, but their 
rates are sensitive to temperature). The steady states of such 
systems are shown in Figure 1 for three different tempera¬ 
tures. Note that in all figures, the temperature colourmaps 
are normalised by the same values, with red corresponding 
toT^3 and pale blue corresponding to T « 1.3. 

We can see that at T = 1.5, the standard self-replicating 
spot dynamic is reproduced. However there is a strong 
sensitivity of the emergent patterns upon temperature. At 
T = 1.4, the rate of the reaction is so slow that the spots 
struggle to reproduce and proliferate (Figure 1(a)). At tem¬ 
peratures of T = 1.3 and lower, the pattern formation all but 
ceases. 

If the temperature is raised to, for example T = 1.6, a 
different phase of pattern emerges: more of an amorphous 
lamellar structure (Figure 1(c)). This is due to the thermal 
reaction rate enhancement causing the round spot structures 
to be unstable compared to the worm-like formations that 
arise instead. 

Overall, we see that the delicate spot patterns cannot re¬ 
ally tolerate changes of temperature and so their viable ther¬ 
mal range is A T nt <0.1. 
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(a) T = 1.4 



(b) T = 1.5 


(a) t = 1000 


(b) t = 2000 





(c) T = 1.6 


Figure 1: Steady state snapshots of the temperature (left col¬ 
umn) and chemical order parameter fields, <fi= ^a 1j2 ~ 
2 , (centre column for system 1 and right column for sys¬ 
tem 2), for a double GSRD system. In these simulations, 
both reactions are thermally neutral ( AHi^ = 0). 

Self-destruction 

In the previous section, we saw structures that could not ex¬ 
ert any influence on their environment, and the result was 
a high degree of sensitivity to the key environmental vari¬ 
able: temperature. Perhaps if we add an extra coupling 
between the spot patterns and that crucial parameter, they 
will be less susceptible to thermal variations. If the reaction 
which drives their existence could interact with the temper¬ 
ature field, a self-stabilising feedback might be induced. An 
exothermic reaction might allow the spots to create more 
favourable local conditions if the background temperature 
was low and becoming a limiting factor. However, there 
is no guarantee that such a coupling would confer stability. 
What if it eroded the viability of the environment rather than 
enhanced it? This section will explore such a possibility. 

As explored by our previous work (Bartlett, 2014; 
Bartlett and Bullock, 2015), exothermic spots exert a very 
strong positive feedback effect on the temperature of their 
surroundings. We illustrate the effect in Figure 2. 

At the beginning of the simulation the spots begin to repli¬ 
cate as normal. However the additional heat released from 
the reaction begins to warm up the surroundings, which fur¬ 
ther augments the reaction rate. This then takes the sys¬ 
tem through different phases of pattern until eventually it 
is swamped and the temperature begins to diverge. The sim¬ 
ulation in Figure 2 was carried out at T = 1.4. As one might 
imagine, higher temperatures simply reduce the time taken 
for the system to ‘explode’. 

If the boundary and initial temperature is taken down to 
T = 1.3, the runaway feedback does not occur but pat¬ 
tern formation is also completely suppressed (the low ini¬ 
tial temperature and concomitantly low reaction rate mean 
that small initial fluctuations are unable to grow into stable 



(d) t = 4000 


Figure 2: Snapshots of the temperature and chemical order 
parameter fields for a double GSRD system. One species 
has an exothermic reaction (AHi = —20 x 10 -3 , cen¬ 
tre column), and the other has a thermally neutral reaction 
(A #2 = 0, right column). The boundary and initial temper¬ 
atures were all fixed at T = 1.4. 


structures). Thus exothermic spots are highly unstable, in 
fact more so than thermally neutral spots. 

What then, of endothermic spots? Our general finding has 
been that these patterns suffer from self-inhibition (Bartlett, 
2014; Bartlett and Bullock, 2015). By reducing the temper¬ 
ature of their surroundings, they reduce the rate of the reac¬ 
tion that drives them and this often proceeds until the pat¬ 
terns themselves are all but extinguished. An example of 
this is shown in Figure 3. 

We see that the endothermic spots rapidly damp them¬ 
selves out. The temperature reductions are quite local so 
the neutral spots (shown in the central column) only expe¬ 
rience a limited amount of destructive cooling. However as 
the simulation proceeds, the endothermic structures eventu¬ 
ally cause their own demise. 

At higher temperatures this self-damping still persists. At 
T = 4, for example (Figure 4), there is no initial pattern 
formation, and the reaction proceeds at a high rate homoge¬ 
neously across the domain. However small fluctuations near 
the boundaries eventually lead to a division of the system 
into two layers next to the upper and lower walls, where a 
string of stable spots form. 

Within these two layers, in the bulk of the system, the re¬ 
action ceases because the reacting layers at either boundary 
extract most of the thermal energy from the interior. Thus 
two films of spots persist, making use of the large heat flux 
from the boundaries to prevent their own self-destruction. 
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(b) t = 1000 


(c) t = 5000 


(d) t = 10000 








(a) t = 4000 



(b) t = 5000 



(c) t = 7000 



(d) t = 10000 



Figure 3: Snapshots of the temperature and chemical order 
parameter fields for a double GSRD system. One species has 
a thermally neutral reaction (AHi = 0, centre column), and 
the other has an endothermic reaction ( AH 2 = 25 x 10 -3 , 
right column). The boundary and initial temperatures were 
all fixed at T = 1.5. 


Figure 4: Snapshots of the temperature and chemical or¬ 
der parameter fields for a GSRD system with two different 
spot species. One species has a thermally neutral reaction 
(AH 1 = 0, centre column), and the other has an endother¬ 
mic reaction (AH 2 = 25 x 10 -3 , right column). The bound¬ 
ary and initial temperatures were all fixed at T = 4. 


Despite this effect, the heat extraction from the spot persis¬ 
tence prevents them from spreading throughout the domain, 
so they do not proliferate. Thus even at very high tempera¬ 
tures, endothermic spots impose a fundamental limit on their 
own existence. 

Harmony 

In the previous sections we illustrated the fundamental 
fragility of self-reproducing spot patterns in thermal GSRD 
systems. If the reaction neither releases nor absorbs heat, 
the type of emergent pattern has a strong temperature depen¬ 
dence, and self-replicating spots only occur within a narrow 
thermal window. If the reaction of one of the spot species 
is exothermic (and the other is neutral), there is a runaway 
positive feedback and the temperature diverges, taking all 
chemical structures with it. Conversely, when one of the 
two reactions is endothermic (and the other neutral), the ef¬ 
fect is one of self-limitation. The cooling from the reaction 
has the effect of reducing its own rate, which in turn leads to 
the damping out of patterns. 

At this stage we might naturally wonder whether some 
combination of exo- and endothermic spots might be able 
to mutually stabilise one another. To harmonise means to 
agree, to complement, and so we will now experiment with 
systems comprised of complementary spot systems. They 
will consist of an exothermic spot species with AHi = 
—20 x 10 -3 , and an endothermic species with AHi = 


25 x 10 -3 . The slightly higher magnitude of the enthalpy 
change of the endothermic reaction is necessary to ensure 
that the exothermic reaction cannot push the system into a 
positive feedback cycle from a small fluctuation (e.g. a tem¬ 
porary period where there are a smaller than average number 
of endothermic spots). 

If we proceed as before with initial and boundary temper¬ 
atures of T = 1.5, we indeed observe stable behaviour, as 
shown in Figure 5(b). There is stable self-replication and the 
heat emitted by species 1 (substances A 1 and B\) is compen¬ 
sated for by the heat absorption of species 2 (substances A 2 
and B 2 ). 

These symbiotic patterns can also provide a stable en¬ 
vironment for themselves at higher temperatures, such as 
T = 2.0. After an initial transient phase, the system set¬ 
tles into a steady state with a stable population of both spot 
species (Figure 5(c)). The reason the structures survive is 
that they are carrying out a form of temperature regulation. 
Despite the boundaries being warmer than ideal, the com¬ 
bined spot system is able regulate the bulk temperature to 
Ti n ~ 1-5. If the boundary and initial temperatures are less 
than T = 1.45, the endothermic spots extinguish themselves 
before the exothermic ones can provide compensatory heat¬ 
ing. 

In conclusion, this combined synergy of an exothermic 
with an endothermic spot species yields a viable tempera¬ 
ture range of A T syrn ~ 1, much greater than the equivalent 
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Figure 5: Snapshots of the temperature and chemical order 
parameter fields for a double GSRD system with three dif¬ 
ferent values for the boundary and initial temperatures. One 
species has an exothermic reaction (AHi = —20 x 10 -3 , 
centre column), and the other has an endothermic reaction 
(AH 2 = 25 x 10 -3 , right column). 


thermally neutral system (note also that when one species 
was exothermic or one was endothermic, there was essen¬ 
tially no viable range over which spots stably formed). 

Having established the temperature window within which 
the symbiotic system can persist, we can also consider the 
influence of initial conditions. Perhaps it is possible that 
once established, a stable population could endure thermal 
perturbations beyond the range of static temperatures that 
we have already observed. 

Resilience 

In this section we will carry out stress tests in which a stable 
population of spots are subjected to temperature variations 
that go beyond their viable range. Note that with initial and 
boundary temperatures outside of the range 1.45 < T < 
2.5, the spots are either destroyed by themselves or the phase 
of pattern transforms and the spots are mostly diminished. 
While this range is much greater than the small range of the 
neutral patterns, there are circumstances under which it can 
be extended yet further. 

Here we will initialise the system at To = 1.5 and then 
vary the temperature in linear ramps over the range 0.8 < 
T < 2.5. The results of this experiment are illustrated in 
Figure 6 and can also be viewed as an animation (Bartlett, 
2016). 

The most striking feature of this graph is how small the 
variations in the internal temperature are compared to the 
variations of the boundary temperatures. The bulk of the 
system stays within the range 1.4 < T^ n < 1.5, despite the 
fact that the boundary temperatures range between 0.8 < 
T b < 2.5. It seems that the two sets of interacting chemical 
patterns are able to regulate their local temperature such that 



Figure 6: Population dynamics of the spot patterns within a 
two-species thermal GSRD system. The number of spots of 
the two types are shown as N exo (red curve) and N en do (blue 
curve). Also shown is the imposed boundary temperature 
Tb (cyan curve), and the mean bulk temperature, T in (green 
curve). Note that this is the mean temperature of the inner 
region of the domain, for which 0.2 H < z < 0 .8FT 


it remains close to a suitable value for their own persistence. 

From Figure 6, we can start to assess the mechanism be¬ 
hind this thermal homeostasis. When the temperature is 
lower than ideal, the exothermic spots tend to outnumber 
their endothermic counterparts. It makes sense that colder 
conditions make it harder for heat absorbing structures to 
survive but easier for those that are heat emitting. 

Conversely, when the temperature is raised high enough to 
destroy the structures under normal conditions, we see that 
the endothermic spots instead have a greater population than 
those that are exothermic. In this case the damping effect of 
the endothermic spots is reduced by the warmer boundary 
temperature and hence they are able to increase in number, 
so much so that they outnumber the exothermic spots and 
contribute to the maintenance of the bulk temperature close 
to T in ~ 1-5. 

Note that even with ideal initial temperatures, the spot 
patterns cannot persist when the boundary temperatures are 
held below Tb = 0.8. Likewise above Tb = 2.5, the sys¬ 
tem begins to bifurcate to a different GSRD phase, and there 
are few spot structures left. Hence the viable range of the 
symbiotic system was A T sym « 1.7, when the initial tem¬ 
perature was close to the ideal value of 1.5. 

What we have seen here is a prime example of precari¬ 
ousness (Virgo, 2011; Froese et al., 2013). These dissipative 
structures do not persist if the initial temperature is lower 
than ^ 1.45, but if a population is established, it can then 
stabilise its environment even in the face of deadly external 
changes. 
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Conclusions 

In this paper we have demonstrated how a regulatory mech¬ 
anism, akin to homeostasis in living organisms, can emerge 
in a very simple non-living system. We carried out numer¬ 
ical simulations of systems with two GSRD systems. We 
found that when the rate constant takes on a standard kinetic 
dependence upon temperature, the emergent patterns in the 
system are readily destroyed when the temperature is per¬ 
turbed away from the ideal pattern-forming value. 

When enthalpy changes are introduced, making only one 
of the reactions either exo- or endothermic, the emergent 
structures are destroyed by strong positive feedback in the 
case of heat release, or damping in the case of heat uptake. 
Hence when the spot patterns have a singular influence on 
the temperature field, there is still no thermal resilience. 

However, with a combined system in which one of the two 
reactions is exothermic and the other endothermic, we find 
that robust self-stabilisation occurs. Even when the bound¬ 
ary temperature was raised to Tb = 2.5, the combined ac¬ 
tion of the spot patterns maintained the internal tempera¬ 
ture of the system very close to the ideal growth value of 
T = 1.5. Also in colder than optimal conditions, the two 
sets of structures were able to retain the ideal temperature 
in the midst of the system. Note that for the very low tem¬ 
peratures, T < 0.8, it was necessary to initialise the sys¬ 
tem at a higher temperature, such that a population of spots 
could first establish themselves, before surviving externally 
imposed temperature reductions. Overall the internal varia¬ 
tion in temperature was only ^ 10% of the boundary tem¬ 
perature variation (see Figure 6). 

We observed that the regulation arose purely from 
changes in the number of spot patterns of the two differ¬ 
ent types. In cool conditions, the exothermic spot popula¬ 
tion rose above that of the endothermic spots. In hot condi¬ 
tions the opposite occurred. It is clear that with a simple set 
of feedbacks between a control variable and the strength of 
the objects carrying out the control, robust regulation of that 
variable can emerge spontaneously. 

There are many parallels between the dynamics of our 
system and the characteristic Daisyworld family of mod¬ 
els (Harvey, 2015; Watson and Lovelock, 2011; Wood et al., 
2008). In both cases, there exist ‘active’ components of a 
system which somehow influence external forcings or pa¬ 
rameters that have a fundamental impact on their persis¬ 
tence. In the case of Daisyworld, black daisies tolerate cold 
temperatures well because they have a warming effect (due 
to their lower albedo) and white daisies persist at high tem¬ 
peratures because they cause cooling. When combined to¬ 
gether on Planet Daisyworld (described through a set of cou¬ 
pled, non-linear differential equations), the combined sys¬ 
tem appears to be able to regulate its temperature in the face 
of changes in external radiative forcing, just like our com¬ 
bined GSRD system. 

However, there are several key differences between 


Daisyworld and our GSRD world. The two species of Daisy- 
world do not destroy themselves when placed in isolation. 
In fact there is some residual stabilisation effects even when 
only one species of daisy is present (Watson and Lovelock, 
2011). So in Daisyworld, one species confers some regu¬ 
lation, and two species confers much more. In the model 
presented in this paper, one species with thermal effects de¬ 
stroys structure, whereas two species with thermal effects 
permits strong stabilisation. 

The common feature of both models is of course that 
two rein controls within a system allows a strong degree 
of negative feedback around a certain point in phase space. 
The original forms of our model system were not created 
to demonstrate the spontaneous emergence of homeostasis, 
as Daisyworld was. Furthermore, the parameter range over 
which daisies can survive is explicitly linked to the width 
of the (prescribed) growth rate-temperature curve. In our 
model, the viability range of the patterns is not prescribed, 
but emergent. 

Observations of our system may have implications for 
ways in which primordial life might have been able to start to 
influence its own survival chances by acquiring simple feed¬ 
back mechanisms with its environment. Perhaps transitions 
such as the prokaryote to eukaryote transition were heavily 
influenced by the additional feedback or control conferred 
on a larger organism when a smaller organism infected it. 
Perhaps many of the characteristic homeostasis mechanisms 
that we see in extant life started off as simple push-pull feed¬ 
back combinations such as those illustrated in this work. 

Further Work 

In this work we have explored an interesting class of dy¬ 
namics in emergent pattern formation. There remain many 
interesting extensions to this work that could further reveal 
new phenomena. The reaction enthalpies of the two spot 
systems could be further increased. In fact it would be de¬ 
sirable to explore the scaling of thermal resilience as a func¬ 
tion of the reaction enthalpy magnitudes. It could be that 
with extremely (thermally) strong reactions, small fluctua¬ 
tions could favour one or the other spot species, yielding a 
collapse of the system (through e.g. temperature divergence 
from dominance of the exothermic species). 

In Section 7.4 of Bartlett (2014) (wherein the reaction en¬ 
thalpies were lower in magnitude than those used in the cur¬ 
rent work), it was observed that pairs of thermally interact¬ 
ing spots (one from each species) locked together spatially. 
It would be useful to establish the point at which this be¬ 
comes unnecessary (i.e. at what level of reaction enthalpy) 
for the survival of the two-spot species system. Furthermore, 
one could vary the diffusion constants of the two species 
such that, e.g. several spots of one species were embed¬ 
ded within the envelope of one spot of the other species. 
Perhaps several small exothermic spots with the correct en¬ 
thalpy value can stabilise one much larger endothermic spot. 
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Such a dynamic takes inspiration from ideas concerning the 
Prokaryote-Eukaryote evolutionary transition. 
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Abstract 

An artificial chemistry with composition devices borrowed 
from object-oriented and functional programming languages 
was introduced in prior work. Actors in object-oriented com- 
binator chemistry are embedded in space and subject to dif¬ 
fusion; since they are neither created nor destroyed, mass is 
conserved. This paper further develops these ideas and ap¬ 
plies them in significant ways. First, it introduces the concept 
of a self-replicating system’s normalized complexity. Nor¬ 
malized complexity permits comparisons between artificial 
organisms defined in different virtual worlds by explicitly ac¬ 
counting for the relative complexities of both organism and 
world. Second, object-oriented combinator chemistry is used 
to define a parallel, asynchronous, spatially distributed self- 
replicating system modeled in part on the living cell. This 
system is strongly constructive since interactions among its 
parts results in the construction of more of these same parts; 
constructed parts are assembled from elements of a few prim¬ 
itive types. The system’s high normalized complexity is con¬ 
trasted with that of a simple composome, which is also de¬ 
fined. 

Introduction 

Much as Turing (1936) had done when motivating his ab¬ 
stract computing machine by comparing it to a human ‘com¬ 
puter’ executing programs with paper and pencil, von Neu¬ 
mann (1966) began his study of self-replication in the ab¬ 
stract, , by thinking about a concrete physical machine. As 
imagined, von Neumann’s kinematic automaton assembled 
copies of itself from a supply of components undergoing 
random motion on the surface of a lake. The components 
consisted of girders, sensors, effectors, logic gates and de¬ 
lays, together with tools for welding and cutting. It is un¬ 
likely that von Neumann ever intended to actually build a 
physical self-replicating machine. More likely, he regarded 
the kinematic automaton as a thought experiment, and aban¬ 
doned it when he understood how the problems of self¬ 
reference, control and construction that truly interested him 
could be rigorously formulated in the abstract domain of cel¬ 
lular automata (CA). 

By abandoning his kinematic automaton, von Neumann 
became the first ‘player’ of a sometimes abstruse ‘game’ 
that many others have played since (Sipper, 1998). This 


‘game’ has two parts and two pitfalls. Roughly speaking, 
the parts are: define a model of computation ; and define a 
self-replicating object (or system) in the model. The two pit- 
falls, which must be avoided if the ‘game’ is to be non-trivial 
are: making the computational model too abstract ; and mak¬ 
ing the primitives too complex. For example, it is trivial for a 
self-replicating object defined as a 1 to ‘replicate’ in an array 
of Os if physics is defined to be a Boolean ‘or’ operation in 
neighborhoods. It is equally trivial for a self-replicator com¬ 
prised of a robotic arm, a camera, and a computer to make 
copies of itself given a supply of robotic arms, cameras and 
computers. Von Neumann himself was very conscious of 
the parts and pitfalls and discusses the tradeoffs the ‘game’ 
presents at length. 1 His ingenious solution was (characteris¬ 
tically) a saddle point, combining a fiendishly simple model 
of computation and an enormously complex self-replicating 
object. 

Nearly sixty years after von Neumann’s death, no one has 
yet constructed a kinematic automaton of the kind he imag¬ 
ined. However, because the ‘game’ seems to many of us to 
still afford the best prospect by which to address the twin 
problems of the origin of life on Earth and its evolution into 
forms of increasing complexity, there is no shortage of new 
‘players.’ Fortunately, current ‘players’ are the beneficiaries 
of a wealth of biological science unknown to von Neumann 
(Watson and Crick, 1953), of significant advances in the sci¬ 
ence of computing that von Neumann and Turing played 
seminal roles in founding, and of a growing body of work 
in the interdisciplinary fields of artificial life and complex 
systems. 

Unsurprisingly (given the preceding), this paper contains 
descriptions of both a new model of computation and of a 
self-replicating system defined using that model. The de¬ 
sign of our model is strongly influenced by the belief that 
something important was lost when von Neumann adopted 
cellular automata as his model. More specifically, we be¬ 
lieve that conservation of mass, a law which all machines 
that assemble copies of themselves from parts must obey, 
was (in effect) the “baby thrown out with the bath water.” 

^ee von Neumann (1966) p. 76-77 and Arbib (1966) p. 179. 
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Figure 1: Actor datatype in object-oriented combinator 
chemistry (OOCC) (left). Self-replicating system of ribo¬ 
some and replisome factories (right). 


Our starting point is artificial chemistry (Dittrich et al., 
2001), the study of the population dynamics of systems of 
constructible objects, which Fontana and Buss (1996) called 
constructive dynamical systems. Like them, we looked to 
the field of computer science for inspiration, hoping to re¬ 
purpose elements of modern object-oriented and functional 
programming languages as primitives and composition de¬ 
vices in an artificial chemistry. From object-oriented pro¬ 
gramming we borrowed the ideas of object composition 
and the association of programs with the data they operate 
on; from functional programming, we borrowed the idea of 
construction of programs by composition of ‘program frag¬ 
ments’ or combinators. 

The first pitfall (excess abstraction) is avoided using a 
twofold strategy. First, we make our artificial chemistry con¬ 
crete by embedding its constructed objects in space and rely¬ 
ing solely on diffusion for dynamics. Significantly, to make 
this tractable, aggregates are treated as masses (unlike Ar- 
bib (1966) who treated aggregates as areas), and mass is 
conserved, which makes it a more plausible host for a kine¬ 
matic automaton of the sort imagined by von Neumann. Sec¬ 
ond, as a guarantee of a different kind of realism, we insist 
that our artificial chemistry must be a bespoke physics in¬ 
terface as defined by Ackley (2013). More specifically, it 
must function as an abstract interface to a physically real¬ 
izable indefinitely scalable computational substrate. This is 
consistent with the notion that kinematic automata defined 
using such interfaces, and which replicate by assembly of 
conserved parts, are tantamount to physical machines. 

We avoid the second pitfall (complex primitives) by using 
(admittedly) complex primitives to build a self-replicating 
system comprised of parts that are still more complex. How¬ 
ever, these more complex parts are constructed by the system 
itself! It follows that the system is strongly constructive in 
the sense that interactions among its parts result in the con¬ 
struction of more of these same parts (Dittrich et al., 2001). 


Our inspiration, the ribosome, allowed us to imagine pro¬ 
grams as enzymes and to define a pair of representations for 
programs, one spatially distributed and inert, the other com¬ 
pact and metabolically active. The self-replicating system 
that resulted is a parallel, asynchronous, distributed compu¬ 
tation modeled in part on the living cell. See Figure 1. 

The model of computation described in this paper has 
features in common with prior work on automata and arti¬ 
ficial chemistry. The idea of movable aggregates of com¬ 
plex automata has precedent in Arbib (1966). The idea 
that kinematic automata can be built in embedded artifi¬ 
cial chemistries has precedent in Laing (1977), Smith et al. 
(2003) and Hutton (2004). The use of nested multisets 
for object composition in an artificial chemistry has prece¬ 
dent in Paun (1998). The use of sequences of combina¬ 
tors as molecules and conservation of mass has precedent 
in di Fenizio (2000). The idea of molecules as programs 
has precedent in Laing (1977), Fontana and Buss (1996), 
di Fenizio (2000) and Hickinbotham et al. (2011). Finally, 
Taylor (2001) has argued that embeddedness and competi¬ 
tion for matter, energy and space are necessary in artificial 
life systems capable of open-ended evolution. 

Normalized Complexity 

Pattee (1995) has described the simulation of an organism in 
a virtual world as an initial value problem where organisms 
are contingent states subject to the non-contingent laws of 
physics. In the ‘game’ of inventing both, there is a tradeoff 
between the non-contingent complexity of models of physi¬ 
cal law, and the purely contingent complexity of artificial or¬ 
ganisms defined inside those models. If physical law is too 
powerful, self-replication becomes trivial; life is too easy. 
Conversely, if physical law is not powerful enough, self¬ 
replication becomes impossible; life is too hard. It’s pos¬ 
sible that the most interesting games, those resulting in a 
bootstrapping process that culminates in organisms capable 
of open-ended evolution, are grounded in physical law “just 
powerful enough.” Von Neumann’s universal replicator Ry 
and its cellular automata virtual world CAy suggest that in¬ 
teresting solutions to the ‘game’ are saddle points, maximiz¬ 
ing the ratio of contingent K(Ry \ CAy) and non-contingent 
complexity K ( CAy): 


K(R V | CAy) 

K(CAy) 

where K is Kolmogorov complexity (Kolmogorov, 1965). 2 
To explore this idea, let’s consider a hierarchy of computa¬ 
tional models; each model is built using an interface exposed 

2 Kolmogorov complexity is correct only if the replicator con¬ 
tains no untranslated information. Indeed, a pure template repli¬ 
cator, e.g., Smith et al. (2003), might have very high Kolmogorov 
complexity yet its normalized complexity is zero since it is com¬ 
posed entirely of information that (apart from being copied) is 
never used. 
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by a more fundamental model. For example, the problem of 
simulating a CA with a more fundamental CA is described 
by Smith (1971). Among many other things, he showed that 
any CA with a Moore neighborhood (8 neighbors) can be 
simulated by a CA with a von Neumann neighborhood (4 
neighbors) with an increase in space and a slowdown in time 
by constant factors that depend only on the numbers of states 
in the CA being simulated: 

CA%(7L 2 ) <i C4 4 (Z 2 ) 

where CAg and CA 4 are CAs with Moore and von Neumann 
neighborhoods, 7Z? is the integer lattice and (<i) is Smith’s 
0 ( 1 ) reduction. 

Asynchronous cellular automata (ACA) are much like 
cellular automata except that local state is updated asyn¬ 
chronously (Nehaniv, 2004). It is possible to demonstrate 
by construction that any CA can be simulated by an ACA 
with an increase in space (Nakamura, 1974; Nehaniv, 2004) 
and a slowdown in time (Berman and Simon, 1988) by con¬ 
stant factors that depend only on the number of states and 
neighborhood size of the CA. Consequently, 

CA(TL 2 ) <1 ACA(Z 2 ) 

where (<i) is Nakamura’s 0(1) reduction. Our approach is 
premised on the idea that models in the object-oriented com- 
binator chemistry (OOCC) defined in this paper can be com¬ 
piled to AC As of one higher dimension. Objects in the artifi¬ 
cial chemistry are instances of a recursive datatype grounded 
in a small number of primitive types and closed under two 
forms of composition. The extra dimension is used to repre¬ 
sent the internal structure of composed objects and the size 
of this representation is defined as an object’s mass: 

OOCC(Z 2 ) <! ACA(Z 2 x N) 

where (<i) is the hypothesized compilation process. Unlike 
Arbib (1966) who assumed that arbitrarily large automata 
aggregates could be moved 0 ( 1 ) distance in 0 ( 1 ) time, we 
assume only that objects of mass m can be moved 0 ( 1 ) dis¬ 
tance in 0(m) time. 

While the significance of our work does not depend on it, 
the hypothesized compilation process is intriguing because 
AC As of dimension three or less can (in principle) be phys¬ 
ically realized in hardware. Furthermore, this can be done 
such that the abstract dimensions of space and time in the 
ACA (and of all models that have been 0(1) reduced to it) 
are coextensive with physical dimensions of space and time: 

ACA(Z 3 ) <1 U 

where U is the physical universe. This is the basis for the 
claim that our artificial chemistry is a bespoke physics in¬ 
terface as defined by Ackley (2013) and that kinematic au¬ 
tomata built with it are tantamount to physical machines. 


Given a replicator R defined on top of a hierarchy of models 
reducible to U by 0(1) reduction R <1 M # <1 • • • <1 M\ <1 
U, the ratio of the contingent and non-contingent complexi¬ 
ties of replicator R becomes 

_ K{R | M n ) _ 

K{M n I M N -\) H- \-K(M2 I M\) + K(M\)' 

The meaning of this quantity, which will henceforward be 
termed a replicator’s normalized complexity , is best illus¬ 
trated by an example. Codd (1968) was able to signifi¬ 
cantly simplify von Neumann’s replicator and its host CA. 
Although the contingent complexity of the Codd replicator 
is significantly less than that of the von Neumann replica¬ 
tor, we speculate that (were they calculated) their normal¬ 
ized complexities would be closer in value. 

Langton (1984) defined a much simpler ‘loop’ replicator 
Ll on top of the Codd CA substrate. Its contingent complex¬ 
ity, K(Ll I CAc ), is much less than that of the Codd replica¬ 
tor, K{Rc | CAc). Nehaniv (2004) showed how the Codd 
CA substrate could be 0(1) reduced to an ACA and demon¬ 
strated the Langton ‘loop’ running on top of the Codd CA 
running on top of this ACA. These results allow us to com¬ 
pare the normalized complexities of the Codd replicator and 
the Langton ‘loop’ defined with respect to the same hierar¬ 
chy of computational models: 

K(L l I CA c ) K(R C | CA c ) 

K(CA c | ACA n )+K{ACA n ) K(CA c \ ACA N ) + K(ACA N ) ' 

Hutton’s work on “artificial cells” provides a second ex¬ 
ample (Hutton, 2004). In Hutton’s virtual world, physical 
law takes the form of an artificial chemistry defined by a set 
of 34 graph rewrite rules. Hutton’s artificial organism is a 
cell-like configuration of atoms Ch + P\ containing a small 
(non-functional) information payload P\ . Significantly, Hut¬ 
ton demonstrated that both the ‘cell’ and its payload are 
replicated by the ‘reaction’ rules of the artificial chemistry. 
Because the payload P\ is untranslated, the contingent com¬ 
plexity of Hutton’s cell is K(Ch | AC34). 

Hutton subsequently extended AC34 by adding six rules 
for translating the payload P\ into an ‘enzyme’ E\ capable 
of ‘catalyzing’ an arbitrary reaction and used this enzyme 
to replace one of the graph rewrite rules, R \. In doing so, 
the (non-functional) information payload becomes a (func¬ 
tional) partial genome and some part of the system’s com¬ 
plexity moves from the non-contingent to the contingent cat¬ 
egory. However, this exchange is insufficient to offset the 
increase in non-contingent complexity that results from the 
addition of the six rules. Consequently, 

K(Ei \Pu AC4o-Ri)+K(C H + Pi |AC 4 q-/?i) K(C h \AC 34 ) 
K(AC^ — R\) K(AC 34 ) 

where K{E\ \ P\ , AC40 —P\) is zero since P\ encodes E\ 
using a process defined by the artificial chemistry AC40 ~R\- 
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Object-Oriented Combinator Chemistry 

There are three types of actors: objects , methods and com¬ 
binator s. Objects and methods are like objects and meth¬ 
ods in object-oriented programming. More specifically, ob¬ 
jects are containers for actors, methods are programs that 
govern actors’ behaviors, and combinators are the building 
blocks used to construct methods (see Figure 1). Like amino 
acids, which can be composed to form polypeptides, primi¬ 
tive combinators can be composed to form composite com¬ 
binators. A method is just a composite combinator that has 
been repackaged or unquoted. Prior to unquoting, combi¬ 
nators do not manifest behaviors, so unquoting might corre¬ 
spond (in this analogy) to the folding of a polypeptide chain 
into a protein. 

Objects are multisets of actors. They are of four im¬ 
mutable types constructed using { }i, { } 2 , { }3 and { } 4 . 
For example, {v, y, z }i is an object of type two that contains 
three actors, x, y and z. Combinators are composed with 
( >=>) and quoted and unquoted using () - and () + . Prim¬ 
itive combinators and empty objects have unit mass. The 
mass of a composite combinator is the sum of the masses 
of the combinators of which it is composed. The mass of 
an object is the sum of its own mass and the masses of the 
actors it contains. Since actors can neither be created nor 
destroyed, mass is conserved. 

Actors are reified by assigning them positions in a 2D vir¬ 
tual world. Computations progress when actors interact with 
other actors in their Moore neighborhoods by running meth¬ 
ods. Methods are sequences of combinators compiled from 
programs defined in a visual programming language. The 
programs in this language, dataflow graphs , serve as ab¬ 
stract specifications of actors’ behaviors. Neither the visual 
programming language nor the combinator language are de¬ 
scribed in this paper since both were described at length in 
Williams (2015). 

All actors are subject to diffusion. An actor’s diffusion 
constant decreases inversely with its mass. This reflects the 
real cost of data transport in the (notional) ACA(7L 2 x N) sub¬ 
strate. Multiple actors can reside at a single site, but diffu¬ 
sion never moves an actor to an adjacent occupied site if 
there is an adjacent empty site. 

The object that contains an actor (with no intervening ob¬ 
jects) is termed the actor’s parent. An actor with no parent is 
a root. Root actors (or actors that have the same parent) can 
associate with one another by means of groups and bonds. 
Association is useful because it allows working sets of ac¬ 
tors to be constructed and the elements of these working sets 
to be addressed in different ways. 

The first way in which actors can associate is as mem¬ 
bers of a group. All actors belong to exactly one group and 
this group can contain a single actor. For this reason, the 
group relation is an equivalence relation on the set of ac¬ 
tors. A group of root actors is said to be embedded. All 
of the actors in an embedded group diffuse as a unit and all 


methods run by actors in an embedded group (or contained 
inside such actors) share a finite time resource in a zero sum 
fashion. Complex computations formulated in terms of large 
numbers of actors running methods inside a single object or 
group will therefore be correspondingly slow. Furthermore, 
because of its large net mass, the object or group that con¬ 
tains them will also be correspondingly immobile. 

The second way in which actors can associate is by bond¬ 
ing. Bonds are short relative addresses that are automati¬ 
cally updated as the actors they link undergo diffusion. Be¬ 
cause bonds are short (L\ distance less than or equal to 
two), they restrict the diffusion of the actors that possess 
them. Undirected bonds are defined by the hand relation 
H , which is a symmetric relation on the set of actors, i.e., 
H(x,y) = H(y,x). Directed bonds are defined by the previ¬ 
ous and next relations, P and A, which are inverse relations 
on the set of actors, i.e., P(x,y) = N(y,x). An actor can pos¬ 
sess at most one bond of each type. 

Apart from composition, containment, groups and bonds 
there is no other mutable persistent state associated with ac¬ 
tors. In particular, there are no integer registers. Primitive 
combinators exist for addressing individual actors or sets of 
actors using most of these relations. Other primitive combi¬ 
nators modify actors’ persistent states. 

Composomes 

Composomes are quasi-stationary molecular assemblies that 
preserve compositional information (Segre et al., 2000). As 
self-replicating entities, they possess very low normalized 
complexity because they do not construct the parts of which 
they are comprised, and individually, these parts are more 
complex than the composome itself. Nevertheless, a com- 
posome serves as a good first example, and we can construct 
one by defining a set of behaviors using dataflow graphs and 
reifying them as an embedded group of methods: 

X = {cmpA, cmpB, cmpC } 

where cmpA, cmpB and cmpC are the three methods and 
{ } denotes an embedded group. The composome’s first two 
methods run in the mother group (the group being copied) 
while its third runs in the daughter group (the copy). Be¬ 
cause methods contained in different embedded groups run 
in parallel, they do not compete for cycles; this decreases the 
time required for self-replication. 

• If cmpA is in a group with others but no members of its 
group have bonds then it finds an unbonded actor in its 
neighborhood similar to itself with no others in its group 
and creates a next bond with it. 

• CmpB first verifies it is in the mother group. If it also has 
an unbonded neighbor similar to a member of its group 
and if the neighbor is not already in a group then it adds 
the neighbor to the daughter’s group. 
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Figure 2: Self-replication by composome. a) CmpA forms 
a next bond with another cmpA instance in its neighbor¬ 
hood. b) CmpB finds a cmpC instance in its neighborhood 
and adds it to the daughter group, c) CmpC (in the daugh¬ 
ter group) deletes the bond joining the mother and daughter 
groups after cmpB is added. 


• CmpC first checks to see if it is in the daughter group. It 
does this by verifying that there is a group member with 
a prev bond (which can only be of type cmpA). It then 
verifies that there is also a group member not similar to 
either the cmpA instance or itself. By process of elimina¬ 
tion, this group member must be of type cmpB. Since the 
daughter group contains the complete set of methods, it 
can delete the prev bond which joins the cmpA instances 
of the mother and daughter groups. 

When placed into the virtual world with a supply of the 
methods that comprise it, the following reaction occurs 

X + cmpA + cmpB + cmpC —>• 2X. 


Because cmpB does not check to see whether the compo¬ 
some already possesses a method before adding it to the 
daughter group, the fraction of reactants converted to com¬ 
plete composomes (self-replication efficiency) will be sig¬ 
nificantly less than 100%. 

Ribosomes 

Biological enzymes can be reified as chains of nucleotides 
or amino acids. The first can be read and copied but are spa¬ 
tially distributed and purely representational; the second are 
representationally opaque but compact and metabolically 
active. Dataflow graphs can be compiled into sequences of 
primitive combinators and reified in analogous ways: genes 
can be read and copied but do not manifest behaviors; en¬ 
zymes manifest behaviors but cannot be read or copied. A 
gene is a spatially extended chain of actors of type combina- 
tor linked with directed bonds: 

= >f=\ c iU ) 

where q(j') is combinator j of gene i and (>) are directed 
bonds. As in the genomes of living cells, sets of genes that 
are expressed together can be grouped together. A plasmid 
is a sequence of genes joined with undirected bonds: 


?=iS,>EU,o) 


where (|) are undirected bonds. An additional undirected 
bond c\ P \ (|G|p| |) | c\ (1) closes the chain. While plasmids 
are spatially distributed chains of multiple actors, enzymes 
are single actors of type method: 

Et = ( >=>/', Ci (j)) + 

where (>=>) is Kleisli composition and () + is the construc¬ 
tor for actors of type method. In addition to plasmids, com¬ 
prised of genes, a minimum self-replicating system might 
contain objects of three types. Ribosomes translate genes 
into enzymes and replisomes copy plasmids. Factories are 
copiers of compositional information, namely, the sets of 
enzymes and objects that comprise ribosomes, replisomes 
and factories themselves. A self-replicating system like this 
would possess semantic closure (Pattee, 1995) because it 
would construct the parts that comprise it (enzymes) from 
descriptions contained within itself (genes). Unlike the cell 
(where enzymes are sequences of amino acids and genes are 
sequences of nucleotides) enzymes and genes are built from 
the same elementary building blocks, i.e., combinators. 

Biological ribosomes translate descriptions of proteins 
encoded as sequences of nucleotides into polypeptides, se¬ 
quences of amino acids, the building blocks of proteins. 
A computational ribosome translates a plasmid into one or 
more enzymes by traversing genes while composing com¬ 
binators from the neighborhood matching those comprising 
the gene. In functional pseudocode, the ribosome evaluates 
the following expression: 

ma P | (( + (fold>(>=>))) P 

where map | maps functions over the genes G ; that comprise 
plasmid P and fold -> is right fold over the combinators q(j) 
that comprise a gene. A computational ribosome can be con¬ 
structed by defining four enzymes that perform these func¬ 
tions and placing them inside an actor of type object: 

R = {ribA, ribl, ribE, ribT }q. 

RibA first attaches R to the plasmid by adding it to the group 
of the initial combinator of some gene, q( 1). Afterwards, 
the (now unnecessary) ribA is expelled (and R becomes R')\ 
see Figure 3 (a). 

After ribosome attachment, ribl finds an actor in the 
neighborhood with type matching c/( 1) and places it inside 
R'\ see Figure 3 (b). When R' is at position j on the plas¬ 
mid, ribE finds a neighbor with type matching Ci(j + 1) and 
composes it with the combinator contained in R f , i.e., with 
Ci( 1) >=> • • • >=> Ci(j). It then advances the position of 
R' to 7 + 1 by following the next bond; see Figure 3 (c). 
This process continues until R' reaches the last combinator 
in the gene, q( |G/|), which possesses a hand bond, at which 
point ribT promotes the combinator to a method, expels the 
method, and moves the ribosome across the bond. 


620 




Figure 3: a) Ribosome attaches itself to plasmid at gene ori¬ 
gin (marked by hand bond) and ejects rib A. b) Combinator 
from neighborhood matching initial combinator is placed in¬ 
side ribosome, c) Combinator from neighborhood matching 
next combinator of plasmid is composed with combinator 
inside ribosome and ribosome advances. 

If plasmid P and ribosome R are placed in the 
virtual world with a supply of primitive combinators 
HpUcKPi 0 ) c ^en the ribosome manufactures the en¬ 
zymes Y<pEp described by the plasmid 

P -\-R ~\~Y,pHc^{p^ c ) c ^ P T - R r + rib A + ]|T p Ep 

where C is the set of 42 primitive combinators and h(p,c) is 
the number of combinators of type c in G p and E p , i.e., the 
gene and enzyme reifications of behavior p. 

Replisomes 

We have already defined a computational ribosome, i.e., an 
object that translates inert descriptions of behaviors encoded 
by a plasmid (genes) into behaviors reified as methods able 
to do actual work (enzymes). We now turn our attention to 
the problem of defining a computational replisome , an ob¬ 
ject that will replicate plasmids. In functional pseudocode, 
the replisome evaluates the following expression: 

(fold| (|) • map | (fold> (>))) P 

where ( | ) and (>) are functions that create undirected and 
directed bonds, fold | is right fold over the genes G ? that com¬ 
prise plasmid P and fold > is right fold over the combinators 
Ci(j) that comprise a gene. 

Biological replisomes copy plasmids in pairs. Replication 
begins when two replisomes are assembled at the plasmid’s 
replication origin. Each replisome manages one replication 
fork. The replication forks move away from the replication 
origin in opposite directions and replication is finished when 
the pair of replisomes reunite at a position on the plasmid 
opposite the origin. A computational replisome can be de¬ 
signed that works in a similar way. As in a cell, there are two 
replication forks. However, unlike a cell, only one moves; 
the other is stationary. The replisome manages the active 
replication fork. It is initially an object containing five en¬ 
zymes and an empty object of the same type as itself: 

Q = {repA, repE, repF, repY, repZ, { }2 }2- 

RepA first causes the replisome to attach to the plasmid. It 
does this by adding the replisome to the group of one of 


Figure 4: a) A short segment of a plasmid and a replisome 
containing five enzymes and an empty object marker of the 
same type as itself, b) Replisome attaches itself to the plas¬ 
mid by joining the group of one of its combinators; attaches 
the marker to the combinator preceding its own attachment 
site; forms a directed bond with the marker; and ejects repA. 
c) Replisome advances along the plasmid after splicing a 
combinator of the correct type from the neighborhood into 
the directed bond that trails it. This is the first combinator of 
the daughter plasmid. 

the plasmid’s combinators. Afterwards, the stationary repli¬ 
cation fork is marked by attaching the empty object { }2 
contained within the replisome to the combinator that pre¬ 
cedes the replisome’s own attachment site. RepA creates 
a directed bond from the marker to the replisome and then 
ejects itself since it is no longer needed; see Figure 4 (b). 

RepE and repF govern the motion of the replication fork. 
RepE finds a combinator in the neighborhood matching the 
combinator attached to replisome Q'. It moves Q' in the 
increasing direction (by joining the group of the combinator 
that follows the replisome’s own attachment site) and splices 
the neighbor into the growing chain (the incomplete daugh¬ 
ter plasmid) that trails it; see Figure 4 (c). RepF is very 
similar except that it moves the replication fork through the 
hand bonds that mark the boundaries between genes. 

Replication is complete when Q' encounters a marker, or 
more precisely, when it finds a marker attached to the com¬ 
binator that follows its own attachment site. In the most 
common case, the replisome and marker are situated within 
a single gene. RepY recognizes this situation and creates 
the final next bond, completing the daughter plasmid. Very 
infrequently, the replisome and marker straddle a boundary 
between two genes. RepZ recognizes this situation and cre¬ 
ates the final hand bond. In both cases, the replisome and 
marker are detached from the plasmid. 

If plasmid P and replisome Q are placed in the 
virtual world with a supply of primitive combinators 
HpHch{Pi c ) c then the replisome copies the plasmid 

P + Q + 1 1pY*ch{Pi c ) c 2 P-\- Q f + repA + { { 2 - 

Self-Replicating Ribosome and Replisome Factories 

Abstractly, factories are copiers of compositional informa¬ 
tion , which is heritable information distinct from the ge¬ 
netic information copied by replisomes and which ribo¬ 
somes translate into enzymes. Concretely, factories are ob- 
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Figure 5: a) Self-replicating ribosome factory contains eight 
enzymes and a model ribosome, b) Directed bonds con¬ 
nect the factory to partially assembled ribosomes while an 
undirected bond connects it to a partially assembled daugh¬ 
ter factory, c) One of the product ribosomes becomes the 
model for the daughter factory while the second is available 
to synthesize enzymes for the encompassing system. 

jects containing a specific set of enzymes and a model , which 
can be either a ribosome or a replisome; see Figure 5 (a). A 
factory’s enzymes can be grouped into four categories: 

• FacP, facN and facH form prev , next and hand bonds with 
empty objects from the neighborhood. The order in which 
these three events occur is more or less random. The ob¬ 
jects { bonded to the mother factory by prev and next 
bonds will become new instances of the model. The ob¬ 
ject { bonded to the mother factory by the hand bond 
will become the daughter factory; see Figure 5 (b). 

• FacU moves enzymes with types matching contents of the 
model into the incomplete model instances. FacV moves 
enzymes with types matching contents of the mother fac¬ 
tory into the incomplete daughter factory. 

• FacX uses the generalized set difference operator to ver¬ 
ify that a new model instance, i.e., product , has all of 
the enzymes that the old model contains. If so, it marks 
the product as complete using a self-directed hand bond. 
Fac Y does the same thing for the daughter factory but uses 
a self-directed prev bond to indicate completeness. 

• FacZ checks to see that both products have self-directed 
hand bonds and also that the daughter factory has a self- 
directed prev bond. If so, it 1) deletes the prev and next 
bonds connecting the mother factory and the products; 2) 
moves one of the completed products into the daughter 
factory (to serve as its model); and 3) deletes the hand 
bond connecting the mother and daughter factories; see 
Figure 5 (c). 

Given the above enzymes, it is now possible to define a 
self-replicating ribosome factory: 

Fr = {facP, facN, facH, facU,facV, facX, facY, facZ, R}\. 

When placed in the virtual world with a supply of empty 
objects and enzymes comprising ribosomes YrFt and fac¬ 
tories Yf Ef the ribosome factory constructs a new ribosome 


factory and a new ribosome: 

Fr + 2{ }o + { } i + 2 Yr E r + Yf E f 2 Fr + R. 

A self-replicating replisome factory Fq can be defined sim¬ 
ilarly. When placed in the virtual world with a supply of 
empty objects and enzymes comprising replisomes Y()Eq 
and factories YF E f the replisome factory constructs a new 
replisome factory and a new replisome: 

FQ + 4{}2 + {}3+2YQ E q + YF E f ~A2 Fq + Q. 

Note that the left hand side of the reaction contains four 
empty objects { }2 instead of two; the extras are the markers 
contained by the new replisome and replisome model. 

We now have all of the components needed to build a self- 
replicating system of ribosome and replisome factories. Un¬ 
like the composome, which copied itself solely by reflec¬ 
tion, the self-replicating system is a quine that translates and 
replicates a self-description reified as a data structure within 
the virtual world itself: 

Pyj = rib A | ribl \ ribE \ ribT \ rep A \ repE \ repF \ repY \ repZ 
| facP | facN \ facH \ facU \ facV \ facX \ facY \ facZ. 

This genome consists of a single plasmid containing 586 
combinators comprising 17 genes. The minimum phenome 
required for bootstrapping the self-replicating system con¬ 
sists of a replisome factory Fq , a ribosome factory Fr and a 
ribosome R. When genome Py and phenome Fq + Fr +R 
are placed in the virtual world with a supply of empty objects 
{}k and combinators Yp xl Yc h{Pi c ) G the system increases 
the redundancy of all of its component parts: 

Fi7+F(2 + Ftf+/? + 2{} 0 + {}i +4{ }2 + {}3 + 3 Yp [7 YcKpf)c 
—> 2Pi7 + 2 Fq + 2 Fr + Q' + repA +{} 2 +/? + /? / + ribA. 

Note that there are three instances of Yp l7 HcKp^ c ) c on the 
left side of the equation. The ribosome R consumes the first 
two making two full circuits of the plasmid synthesizing the 
system’s enzymes 

P\l FR + 2Yp ll Ych(p,c) c —»• P 17 +R r + ribA + 2£p 1? E p 

while the replisome Q (assembled by Fq) uses the last copy¬ 
ing the plasmid. 

Comparison of Normalized Complexities 

It is useful to compare the normalized complexities of the 
self-replicating system of ribosome and replisome factories 
and the composome defined earlier. Recall that the compo¬ 
some X is composed of 3 enzymes of 3 types: cmpA, cmpB 
and cmpC; these enzymes are in turn composed of 66 com¬ 
binators of 17 types. Because the enzymes are defined out¬ 
side the system, their complexity is non-contingent, and the 
composome’s normalized complexity is quite low: 
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K({cmpA, cmpB, cmpC} | cmpA + cmpB + cmpC) 

K(cmpA + cmpB + cmpC | OOCC xl ) + K(OOCC xl \ AC A) + K(ACA) 

where K(cmpA + cmpB + cmpC | OOCC 17 ) is the portion 
of the composome’s non-contingent complexity contained 
in its three enzymes. 

In contrast, the self-replicating system of ribosome and 
replisome factories is composed of 17 different behaviors 
reified as both genes and enzymes; these genes and en¬ 
zymes are in turn composed of 3x586 = 1758 combina- 
tors of 31 types. Furthermore, because the enzymes are 
defined within the system itself (by the genes), their com¬ 
plexity is (unlike that of the composome’s enzymes) contin¬ 
gent. Consequently, the self-replicating system of ribosome 
and replisome factories possesses significantly higher nor¬ 
malized complexity than the composome: 

K(F Q + F r +R 11> 17 E p ) +K(£p 17 E p | P 17 , R, OOCC 3l )+K(P l7 \ OOCC 31 ) 
K(OOCC 3 1 | AC A) + K(ACA ) 

where K(Fq + F r +R | £ p xl E p ) and K{Pyj \ OOCC 3 \) are 
the compositional and genetically encoded portions of 
the self-replicating system’s contingent complexity and 
K(J^ Pll E p |Pi 7 , R, OOCC 3 \) is zero because the plasmid Pi 7 
encodes the enzymes Y<p l7 E p using a process defined by the 
ribosome R and the object-oriented combinator chemistry. 

Conclusion 

This paper introduced the concept of a self-replicating sys¬ 
tem’s normalized complexity. Normalized complexity per¬ 
mits comparisons between artificial organisms defined in 
different virtual worlds by explicitly accounting for the rela¬ 
tive complexities of both organism and world. An object- 
oriented combinator chemistry (introduced in prior work) 
was used to define a parallel, asynchronous, spatially dis¬ 
tributed self-replicating system modeled in part on the living 
cell. The high normalized complexity of this self-replicating 
system of ribosome and replisome factories was contrasted 
with that of a simple composome. 
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Abstract 

Liquid handling robots are rarely used in the domain of 
artificial life. In this field, transitory behaviours of non¬ 
equilibrium man-made systems are studied and need an au¬ 
tomatic monitoring and logging of results. In addition, ar¬ 
tificial life experiments are dynamic with frequent changes, 
which makes it difficult to apply conventional liquid handling 
robots as they are designed to automate a pre-defined task. 

In order to address these issues, we have developed an open 
source liquid handling robot, EvoBot. It uses a modular ap¬ 
proach, which gives us the possibility to reconfigure the robot 
for different experiments and make it possible for users to 
add functionality by just developing a function specific mod¬ 
ule. In addition, it provides sensors and extra functionality 
for monitoring an experiment, which allows researchers to 
perform interactive experiments with the aim of prolonging 
non-equilibrium conditions. In this paper, we describe the 
modular design of EvoBot, document its performance, and 
provide a novel example of an interactive experiment in arti¬ 
ficial life, where the robot nurtures a microbial fuel cell based 
on its voltage output. 

Introduction 

Liquid handling robots are often employed in chemical 
and biochemical laboratories in order to automate repetitive 
tasks (see (Kong et al., 2012) for a useful overview of liquid 
handling robots for lab automation). The benefit of this is 
typically an increase in reliability, throughput and precision 
when compared to manual liquid handling coupled with a 
reduction in labour cost. However, liquid handling robots 
are rarely used in the domain of artificial life. 

There are two types of challenges that make existing liq¬ 
uid handling robots inappropriate for experiments in artifi¬ 
cial life. The first relates to the functionality needed from 
the robot to be useful in artificial life research. In artificial 
life we are mostly interested in the initial condition of an 
experiment and how it develops but not typically in the end 
result, because the end result is equilibrium which in the 
context of artificial life corresponds to death. This means 
that not only should the robot prepare experiments, but also 
employ automatic monitoring and logging of results. Fur¬ 
thermore, it would be desirable if the liquid handling robot, 


based on this monitoring, can interact with the experiment to 
extend its life-time. Solving this challenge has been the ba¬ 
sis for the results of (Gutierrez, 2012; Gutierrez et al., 2014; 
Hanczyc et al., 2015; Muller et al., 2015), who all made an 
ad-hoc liquid handling robot for specific experiments in ar¬ 
tificial chemical life. 

The second type of challenge, which is the focus of our 
work, is the practical challenge of employing liquid han¬ 
dling robots in artificial life research. Artificial life research 
is often performed in small, government funded labs and as 
such the cost of acquisition is a limiting factor. Artificial 
life experiments are not static, but develop in response to 
the insights obtained from the experiments and this makes it 
difficult to apply conventional liquid handling robots as re¬ 
programming and reconfiguration is often tedious and time- 
consuming if at all possible for the end user. Furthermore, 
conventional liquid handling robots are often only setup to 
perform one specific task, thus making it difficult to modify 
with functionalities required for new types of experiments. 
If we could provide solutions to these challenges, artificial 
life research could also benefit from the advantages of au¬ 
tomation namely increased reliability, throughput, precision 
and reduced cost. Turning these challenges around we are 
looking for liquid handling robot technology that has the fol¬ 
lowing characteristics: 

• Reconfigurable 

• Versatile 

• Low cost 

• Extendable 

In order to address these challenges, we propose the use of 
a modular design borrowing a significant number of advan¬ 
tages from modular robots(Yim et al., 2007). A modular de¬ 
sign allows a non-expert user to reconfigure the robot for dif¬ 
ferent experiments by swapping in and out modules. Expert 
knowledge is only required when designing modules with 
new functionalities. A modular approach also increases the 
versatility of liquid handling robots because they can eas¬ 
ily be reconfigured for many different types of experiment 
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Figure 1: The EvoBot liquid handling robot (left) and a close-up of the movable head equipped with one syringe module. 


during their life-time. We have made EvoBot open-source, 
which reduces the cost of acquisition and also allows re¬ 
searchers to build on our reliable platform and extend it with 
modules that have the necessary functionality for their spe¬ 
cific experiment 1 . 

In this paper we present the modular design of the liq¬ 
uid handling robot, EvoBot, and carefully document its per¬ 
formance to make it possible for potential users to evaluate 
how appropriate it is for their experiment. We have already 
demonstrated its usefulness in artificial chemical life (Ne- 
jatimoharrami et al., 2016), but here we further evaluate its 
use in a new application domain, nurturing microbial fuel 
cells (MFCs). MFCs are devices where microbes convert 
organic matter directly into electricity. MFCs have demon¬ 
strated their utility as a basis for building robots with an ar¬ 
tificial metabolism (Ieropoulos et al., 2010). 

The conclusion is that EvoBot due to its versatility, ex- 
tendability, and low cost was successfully implemented in 
the new application domain of nurturing microbial fuel cells 
with very limited modification. It is also demonstrated that 
using the robot for maintaining microbial fuel cells, com¬ 
pared to manually nurturing them, has numerous advan¬ 
tages, and thus in time may provide insights that can help 
researchers develop more efficient microbial fuel cells. 

EvoBot Design Principles 

As outlined above, a key goal of the EvoBot design was to 
develop a robotic platform, which can be configured for a 

1 Source code and design files can be found in Git 
repositories, https : //bitbucket. org/af aina/ 

evobliss-software and https://bitbucket.org/ 
af aina/evobliss-hardware respectively. 


wide range of experiments without the involvement of an 
expert user. In order to achieve this, we modularised the 
design building on research in the field of modular robotics 
(Yim et al., 2007). A modular approach allows us to en¬ 
capsulate complexity while providing a simple mechatronics 
plug’n’play interface to the system. 

EvoBot (see Figure 1) consists of one structural frame 
and three types of layers, which in the default configura¬ 
tion are organised as follows: the top layer carries actuation, 
the middle layer is the experimental section, and the bottom 
layer is an observation platform. However, this default lay¬ 
out can easily be amended, e.g. several experimental layers 
can be introduced if a cascading experiment is under investi¬ 
gation. These three layers can easily be moved up and down 
in the frame. Functionality is provided in the form of mod¬ 
ules, which allow functionality to be added incrementally in 
the form of new modules. 

In order to create a high quality and cheap platform, we 
built the robot from off-the-shelf components and, where 
possible, used components used by the open-source 3D 
printer community and therefore readily available. For in¬ 
stance, we used Nemal7 motors for actuation and Arduino 
based electronics for control of the robot. Another key prin¬ 
ciple was to favour laser-cut acrylics over 3D printing. The 
reasons being lower production time and high-quality of the 
produced components in comparison with the elements pro¬ 
duced through 3D-printing. However, we did use 3D printed 
components outside the core mechanical structure, primarily 
inside the syringe module (described later) due to the geo¬ 
metrical flexibility afforded by 3D printers. 
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EvoBot Implementation 

In the following section, we will provide an overview of 
the hardware (mechanics & electronics), and software of 
EvoBot. 

Mechanics 

First, we will describe the layers of EvoBot in more detail 
followed by the implementation of the modules. 

Frame The frame is made of aluminium profiles and it 
provides the physical support for the layers. They can be 
attached to the frame at specific heights every 20mm. Ad¬ 
ditionally, it allows levelling the robot with four adjustable 
feet. The frame measures 600x400x600mm but it can easily 
be extended. 

Layers The EvoBot platform is organised into three types 
of layer: actuation, experiment, and observation layers. 

The actuation layer contains a head, which can move in 
the horizontal plane using two belt and pinion mechanisms 
and two stepper motors. Up to 11 modules can be mounted 
on this head to provide different functionalities. At this 
point, only an actuated syringe module and a heavy pay- 
load module have been implemented, but various actuators 
and sensor modules are envisaged, e.g. temperature sensor, 
pH sensor, gripper for manipulation of dishes, extruder for 
printing reaction vessels, etc. 

The experimental layer is essentially just a frame with a 
glass plate where vessels can be organised as required by 
the specific experiment. There is a hole in one comer, where 
vessels can be moved and dropped for automatic disposal, 
and together with a Petri dish dispenser system under devel¬ 
opment, a large number of experiments can be done in se¬ 
quence. However, we have found that for now it is enough to 
clean reaction vessels by going through three water wash cy¬ 
cles, where vessels repeatedly are filled with water and emp¬ 
tied using the syringe modules. This makes the dispenser 
functionality less critical for long term operation. However, 
for more sensitive experiments the vessels and also syringe 
tips may have to be replaced on a per experiment basis. 

The observation layer is essentially the same as the ac¬ 
tuation layer except that modules cannot interact physically 
with the experiments above, because they are shielded by 
the glass plate. This limits the useful modules for this layer 
to modules that do not directly manipulate the experiments. 
Currently, a static webcam is used in the observation layer 
to monitor the experiment and provide feedback to the robot. 
However, in the longer term thermal imaging, magnetic stir¬ 
ring, liquid effluent sampling or the like could be integrated 
in modules for the observation layer. 

Syringe and Heavy Payload Module The syringe module 
can be seen in Figure 2.a-b has two degrees of freedom. It 
has a linear stepper motor (a stepper motor with a lead screw 



Figure 2: Syringe module (a, b) and heavy payload module 
(c). 

and an internal nut) for moving the plunger up and down and 
a rack and pinion mechanism with another stepper motor for 
moving the syringe up and down. Syringes up to 20ml can 
be used and they can easily be replaced giving the user the 
opportunity to use the syringe that matches the experimental 
requirements. 

The heavy payload module is designed to hold and move 
up and down big and heavy parts. A lead screw mechanism 
is used to hold the payload even if its motor is switched off. 
A Nema 17 stepper motor moves the payload but it can also 
be manually operated with a crank. Its stroke is 80mm and 
the maximum speed is 8mm/s. This module is shown in 
Figure 2.c and its end effector is designed to hold an OCT 
scanner, but it can easily be modified to hold other devices. 

More modules are under development including a mod¬ 
ule with a gripper, and a module to measure pH. However, an 
EvoBot with one syringe is enough to perform useful experi¬ 
ments in artificial chemical life as demonstrated in (Hanczyc 
et al., 2015; Gutierrez et al., 2014). 

Electronics 

The core of the electronics of EvoBot is based on electronics 
used in the open-source 3D printer community. Specifically, 
we use the Arduino MEGA 2560 R3 with the shield RAMPS 
vl.4. This provides us with a mature and cheap electronics 
platform to build on, but perhaps more importantly, allows 
us to build on existing software for open-source 3D printers. 

A key aspect in the electronics design was to keep the 
number of wires between components as low as possible 
to avoid interference with experiments, become a source 
of error if disconnected accidentally, and to give the robot 
a clean look. In order to do this, a circuit board was de¬ 
signed that routes power and communication to the modules 
on the head of the robot. This board is fairly large, but con- 
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sists only of simple routing, two input/output port expanders 
with serial interface (I2C) and spring connectors. For the 
modules, a custom board was also made that contains two 
stepper drivers with SPI communications (L6470) for mov¬ 
ing up and down the stepper motors of the module, see Fig¬ 
ure 2.a. When a module is placed on the head, the spring 
connector touches the pads of the board of the syringe and 
transmits power and the SPI signals. Furthermore, the wires 
between the Arduino and the head are reduced using the I2C 
expander ports as they generate the Chip Select signals for 
the SPI communication. Using this approach, only a power 
cable and an 8-way ribbon cable with the I2C and SPI buses 
are necessary to manage the 22 motors of the 11 syringe 
modules, which can be placed on the head. 

Software 

The goal of the software was to provide the end-user with 
a simple programming interface to the robot. The software 
has a host side and a robot side. The host side communi¬ 
cates with the robot side over a serial USB connection and 
the robot side software is a modified version of the Marlin 
firmware used in open-source 3D printers. 

On the host side, we have chosen Python as the implemen¬ 
tation language as this was the language with which our col¬ 
laborators have most experience and also due to its simplic¬ 
ity. The software is divided into a graphical user-interface 
for manual control of the robot and a simple application pro¬ 
gramming interface. 

Robot Manual Control The purpose of the manual con¬ 
trol graphical user interface is simply to be able to test var¬ 
ious functions of the robot without having to resort to pro¬ 
gramming. However, the most important use of the manual 
control program is to learn the position of interesting ob¬ 
jects in the robots coordination frame. For example, the user 
could move the robot until it is approximately in the centre 
of a Petri dish of interest and lower the syringe so the tip is 
in the vessel. This is an empirical way of defining the geom¬ 
etry for an experiment, and perhaps the most practical for a 
range of experiments. 

Application Programming Interface The application 
programming interface is also kept as simple as possible. 
The interface gives the programmer access to moving the 
robot head, moving the syringes and plungers, and inquiring 
about the positions of these elements. 

The application programming interface is built on top of 
printcore, which is an open source Python library for in¬ 
teracting with 3D printers and pySerial, which handles the 
serial communication. The printcore library, however, has 
been heavily reduced and is likely to be unnecessary in the 
near future. 

At the bottom of the software stack we have implemented 
a basic simulation that can be swapped instead of the se¬ 
rial communication library. This makes it possible to debug 


Table 1: Max speed and accelerations of the robot. 



X 

Y 

z 

(Syringe) 

Plunger 

Max Speed 
(mm/s) 

180 

180 

235 

8 

Acceleration 

(mm/s 2 ) 

3000 

3000 

235 

4 


the upper layers without actually moving the physical robot. 
This is a feature that has saved a significant amount of de¬ 
velopment time. 

Marlin-Based Firmware On the robot side we run an ex¬ 
tended version of Marlin, a firmware used to control open- 
source 3D printers. This gives us a mature basis for our robot 
controller. The head of the actuation layer is controlled di¬ 
rectly using the functionality Marlin provides. For the sy¬ 
ringe or heavy payload modules, we have extended Marlin 
with G-code commands for interacting with them. The use 
of Marlin is also beneficial as it is our plan to make an ex¬ 
truder module, which can 3D print reaction vessels and thus 
this aspect of the firmware can also be put to good use. 

Precision and Performance 

For the reader interested in understanding if EvoBot is suit¬ 
able for their experiments and to facilitate comparison be¬ 
tween liquid handling robots, we provide below a careful 
investigation of the precision and performance of EvoBot. 

Speed 

A key parameter to perform experiments is the speed of the 
robot because the experiments could take a long time if the 
speed of the robot is low or it is not able to manage a set of 
tests simultaneously. With this objective in mind, the max¬ 
imum speed of the robot was tested for each axis. The test 
consisted of movements on each axis, increasing its acceler¬ 
ation and maximum speed until the robot lost some steps. In 
other words, the maximum speed is when the positioning of 
the robot after several movements is not accurate anymore. 
Table 1 contains the results of these tests and displays that 
they are significantly higher than the requested accuracy by 
our collaborators. 

Positioning Performance 

The positioning performance has been evaluated based on 
the ISO 9283:1998. Thus, one syringe module was modified 
to hold a test probe, instead of a syringe. This probe was 
used as an end effector to touch a dial indicator, which has an 
accuracy of 0.01mm. Given the short stroke of the indicator 
(25mm), we ran two different tests to calculate the accuracy 
and the repeatability of the robot on each axis. 
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Table 2: Positioning accuracy for each axis of the robot. 



0 mm 

1 mm 

2 mm 

10 mm 

20 mm 

X 

(mm) 

0.04 

0.03 

-0.04 

-0.11 

-0.17 

Y 

(mm) 

0.01 

-0.05 

-0.08 

-0.14 

-0.18 

Z 

(mm) 

0.10 

0.02 

0.09 

0.28 

0.35 


Table 3: Repeatability for each axis of the robot. 


X 

Y 

Z 

RP (mm) 0.067 

0.013 

0.106 


Accuracy is simply defined as A = q — q t , where q t is 
the target position and q is the average of n measured coor¬ 
dinates. In this test, the robot was homed, and the dial indi¬ 
cator was placed in contact with the probe and set to zero. 
Then, the robot moved one axis sequentially to five different 
positions (1,2, 10, 20 and 0 mm) and the real position of the 
probe was measured. This sequence was repeated 30 times 
and the accuracy for each point was calculated. The data is 
shown in Table 2, where we can observe that the position¬ 
ing is quite reasonable for a low cost machine and there is 
no noticeable discrepancy. Furthermore, the slow increment 
of the inaccuracy when moving the robot to 10 and 20mm 
is mainly caused by a misalignment between the axis of the 
indicator and the axis of the robot. This is confirmed by 
moving the robot to longer distances where we do not detect 
any appreciable error. In the future, we will employ com¬ 
puter vision technology to calibrate the positioning accuracy 
measuring the three axes at the same time. 

Regarding the repeatability or precision of the robot, we 
ran another experiment. Thus, five points were defined all 
along the workspace of the robot and the robot moved the 
probe sequentially from one point to the next one. At the 
last point, the end effector touched a dial indicator to mea¬ 
sure its coordinates. This sequence was repeated 30 times 
for each axis and repeatability was calculated using Equa¬ 
tion 1, where q are the measurements, q is the average of 
these measurements, n the number of trials, S the standard 
deviation. Table 3 shows the repeatability for each axis. Re¬ 
sults display very good repeatability in the three axis, around 
or less than 0.1mm. 


RP = 3Sq = 3W ^ i=1 ^ —(1) 
y n — 1 

Liquid Handling Performance 

In order to measure the precision and accuracy of handling 
droplets, we performed various experiments with different 



Figure 3: Boxplot comparing experiments with 100/il sy¬ 
ringe. The whiskers represent the lowest and highest datum 
still within 1.5 IQR (interquartile range) of the lower quartile 
or of the upper quartile, respectively. 


syringe sizes, needle diameters and droplet volumes. The 
experiment was to absorb distilled water from a Petri dish 
and dispense it over a scale with ±0.003g precision, and 
therefore obtaining droplet volume by the mass-volume rela¬ 
tion. Four experiments were performed with a professional 
100 fi 1 syringe (Hamilton 710 LT) to handle liquid volumes 
of 15 /A and 30/il, with needle internal diameters of 0.96mm 
and 1.7mm. To obtain better results in these experiments, a 
small quantity of air was taken into the syringe prior to ab¬ 
sorbing water. Four other experiments were performed with 
a cheap disposable 5ml syringe (Braun Omnifix) to handle 
liquid volumes of 1ml and 0.5ml. In these experiments, the 
needle internal diameters were 0.96mm and 1.7mm. Each 
experiment was run 30 times. 

The results of the experiments have been compared using 
box-plots with 100 /A and 5ml syringes respectively, Fig¬ 
ure 3 and 4. Considering the error introduced by the scale, 
the results obtained with the 100 /A syringe are quite rea¬ 
sonable. They are all accurate and the repeatability is good, 
except for the 15/il test with a needle of 1.7mm. This is due 
to surface tension, which randomly prevents the last droplet 
from being ejected. That amount of residual water in the 
tip, is taken back in during the next test, which results in 
an additively larger volume of water, with the fresh sample 
coming in. This explains the outliers of the experiment (0 
and 25jA respectively). Regarding the experiments with 1 
and 0.5ml, results show some inaccuracy, but an acceptable 
repeatability. Nevertheless, we could calibrate the robot to 
increase the accuracy while keeping costs low. In addition, 
(again due to surface tension) the last droplet is lost in al¬ 
most all the tests, which decreases the repeatability. Again, 
this effect is more noticeable for the needle with the 1.7mm 
diameter. We will study how to avoid this negative effect 
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Figure 4: Boxplot comparing experiments with 5ml syringe. 
The whiskers represent the lowest and highest datum still 
within 1.5 IQR of the lower quartile or of the upper quartile, 
respectively. 


in future work. Some alternatives could be to increase the 
speed of the plunger while dispensing the liquid, to generate 
a small vibration when moving up the syringe quickly or to 
use pipettes with low surface tension. 

Interactive Feeding of Microbial Fuel Cells 

In this experiment we document how we applied EvoBot to 
the task of maintaining microbial fuel cells (MFCs). The mi¬ 
crobial metabolism, utilises the carbon energy source within 
the anode chamber, which eventually depletes the carbon 
content and results in decreasing the voltage output from the 
MFC; in other words, as the fuel runs out, the voltage de¬ 
creases. Thus, the robot feeds the MFC with more organic 
material when the voltage is below a threshold. And as a 
result of the interaction, the experiment is prolonged. 

In order to analyse the advantages of using a liquid han¬ 
dling robot, two parallel microbial fuel cell experiments 
were performed using identical materials and methods with 
the only difference being that one of them was carried out 
manually, as a replica (control) experiment, and the other 
one was carried out by the EvoBot platform (Figure 5). In 
the case of the robot experiment, the voltage is sampled ev¬ 
ery minute, the MFCs are hydrated every four hours and 
only fed if the voltage is below the specified threshold. In 
contrast, the voltage of the replica (on the bench) was sam¬ 
pled every three minutes, and the MFCs were hydrated twice 
a day - morning and afternoon - and fed once every morning. 


Microbial Fuel Cell Structure 

Nine small-scale, 3D printed from Nanocure® resin, open- 
to-air cathode MFCs were used in these experiments. The 
volume of each of the fuel cell anode chambers was 6.25 
mL, and the anode and cathode chambers were separated by 
a single sheet of activated cation exchange membrane, CMI- 
7000S (Membranes International Inc., Ring wood, USA). 
Two rubber gaskets, one for each half-cell, sandwiched the 
membrane, and ensured watertight sealing, after the two 
chambers were bolted together using stainless steel studding 
and nuts. 

Electrode Material (Anode and Cathode) 

Untreated (catalyst free) carbon fibre veil, with 30g/m 2 car¬ 
bon loading (PMF Composites, Dorset, U.K.) was used as 
anode electrode with a total surface area of 168 cm 2 . The 
anode electrode was folded down 5 times, until the projected 
(exposed) surface area was 5.25 cm 2 and could therefore 
fit into the anodic chamber (18 mm x 28 mm). The cath¬ 
ode electrode was made of two layers, a gas diffusion layer 
(GDL) and a Micro-Porous Layer (MPL). The GDL was a 
single sheet of carbon veil coated with 30% Polytetrafluo- 
roethylene (PTFE) (Sigma Aldrich, UK). Once GDL cured, 
activated carbon paste was applied on top to form a thick 
MPL (1 mm). The activated carbon paste was a mixture of 
activated carbon powder (G.Baldwins & Co., London, U.K.) 
blended with PTFE in a 4:1 ratio and deionised water (120 
mL). The activated carbon paste was then hot pressed, us¬ 
ing a household iron (Gajda et al., 2015), and subsequently 
heated for 15 minutes to 200°C to allow MPL liquefaction. 

Inoculation 

For the MFC experiment on the EvoBot platform, the inoc¬ 
ulation (i.e. introduction of live microorganisms in a ster¬ 
ile MFC) was done using the anolyte from already estab¬ 
lished MFCs. This was activated sludge fed with carbon 
sources such as tryptone yeast-extract (TYE) over a period 
of months, which had been sieved to remove large particles 
(>1 mm) so as to prevent blockage of the syringe needle 
on Evobot. For the replica bench experiment, neat activated 
sludge supplied by the Wessex Water Scientific Laboratory 
(Saltford, UK) was used as the initial inoculum. All MFCs 
in both experiments were kept under a fixed load of 3.9kU, 
for the whole duration. 

Experimental software setup 

Each MFC was individually connected to a separate channel 
on the Picolog data acquisition unit (ADC-24, Pico Tech¬ 
nology, Cambridgeshire, UK) which was then connected to 
a PC, so that the DC voltage output of each unit could be 
continuously recorded. A set of MFC feed functions was 
created in Lab View (National Instruments Corporation [UK] 
Ltd, Berkshire, UK), which sampled the Picolog DLL file 
every 60 seconds for the MFC voltage reading. A threshold 
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Figure 5: Interactive Feeding of Microbial Fuel Cells. Nine 
individually connected MFCs were operated with Evobot 
platform. The experiment took place on the experiment 
layer of the robot. The replica bench experiment was setup 
in exactly the same manner. 

limit for each MFC was set in Lab View where if the voltage 
dropped below this threshold, then a Python script would 
activate to move the head of the robot over the food source, 
draw 3 mL of substrate (carbon fuel) and then inject this into 
the MFC. 

After the feed substrate has been deposited into the anode, 
the syringe module would go through a wash cycle where 
3.5 mL of 70% ethanol was drawn into the syringe and dis¬ 
posed of down a waste tube on the Evobot platform, fol¬ 
lowed by the taking of 3.5 mL of sterile distilled H 2 0 into 
the syringe, before disposing this down a waste tube and 
then returning the robot head to the home position. At the 
home position, the feed function paused for 60 minutes to al¬ 
low stabilisation of the MFC and for the voltage to increase 
above the threshold. 

A cathode hydration cycle was also incorporated in a sep¬ 
arate function in Lab View, which activated a Python script 
every 4 hours. This script moved the robot head over the 
position of each MFC and deposited 3.12 mL of deionised 
water into the cathode chamber before returning to the home 
position. 

Although for the Evobot experiments, the ‘wash cycle’ 
with ethanol and deionised water was deemed necessary, in 
order to avoid the cross-contamination between the different 
carbon-energy sources (acetate, lactate and cellulose), this 
was not necessary for the replica bench experiment, which 
was carried out manually. 

Results 

The experiment lasted approximately 8 weeks for the Evobot 
experiment and 4 weeks for the replica and the comparisons 
between the experiments are made for the same 4 week pe¬ 
riod. The power output of one MFC fed by EvoBot and the 
feeding events are shown in Figure 6. 

A significant performance difference was found between 
the replica experiment and the EvBbot experiment; the 
replica experiment (data not shown) showed higher power 



Figure 6: Power output from one of the MFCs (MFC2) fed 
by the Evobot with sodium acetate. The red arrows indicate 
the points where the voltage output of the MFC dropped be¬ 
low the 80mV threshold, which was the trigger for feeding 
the MFC. The dotted lines indicate the periods during which 
three different concentrations of sodium acetate were tested. 


output levels from the MFCs, compared to those from the 
MFCs on EvoBot. This could have been due to the wash 
cycle with ethanol, which would have inevitably left resid¬ 
ual ethanol in the syringe during the course of the experi¬ 
ment - the replica experiment did not require a cleaning cy¬ 
cle. Also, the sieving of sludge for the inoculation of the 
MFCs may have well resulted in a less enriched inoculum, 
and this was done to prevent the syringe needle from block¬ 
ing - again, this was not an issue for the parallel bench ex¬ 
periment. 

Nevertheless, having automated feeding pulses which 
were dictated by the voltage threshold, the behaviour of 
a MFC could more closely be monitored ’’day or night” 
in a way that would otherwise require an operator to be 
continuously present. In addition, the automated hydra¬ 
tion cycle was advantageous, since it helped us identify em¬ 
pirically the aqueous 0 2 saturation levels for the oxygen- 
reduction-reaction (ORR) that is necessary for the open-to- 
air-cathodes. In other words, beyond this ’performance sat¬ 
uration’ point, the addition of more water did not result in 
an increased MFC performance. This is shown in Figure 
7, which illustrates the behaviour of the acetate fed MFC, 
from days 31-34 (data taken from Figure 6). The fluctuating 
electrical output is in the response to the cathode hydration, 
however as can be seen, the overall performance during that 
feeding cycle, remains the same (on average). 
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Figure 7: Power output from the same MFC2, following 
cathode hydrations by Evobot, during one feeding cycle be¬ 
tween days 31-34 of the experiment. Black arrows indicate 
points of EvoBot hydration; the red arrow indicates a point 
of feeding. 

Future Work 

We are currently working to address the issues caused by 
the ‘wash cycle’. A simple option would be to increase the 
number of times that the syringe is washed with distilled wa¬ 
ter, but we are also looking into other different techniques 
to avoid cross contamination. First, a dispensing module is 
under development, which uses external pumps to provide 
pure reagents or solutions. Thus, different organic materials 
could be used to feed the MFCs without having to use the 
same syringe. Additionally, we are also starting to use dis¬ 
posable pipette tips, but they have to be placed in the syringe 
manually. We hope to automate the change of these pipette 
tips in the future. 

Regarding the interactive experiments, they are now based 
on the voltage of the MFCs. However, we are planning to 
extend the robot with sensing modules for measuring more 
parameters. As an example, we are developing a pH mod¬ 
ule, which will allow us to control the pH of the MFCs by 
adding acid or alkaline solutions. Our intention is that new 
interactive experiments will allow us to pose different scien¬ 
tific questions, possess novel data never recorded before and 
help us develop better MFCs. 

Conclusions 

The EvoBot robot is an open-source, modular liquid han¬ 
dling robot. Our design focuses on high quality, relying 
on open-source components and software, and being eas¬ 
ily reconfigurable and extendable. Thus, EvoBot provides a 
versatile and low cost tool to carry out research in multiple 
fields. We have in this paper described the overall imple¬ 
mentation of the mechanical structure consisting of layers 


and modules and documented the performance of the robot, 
which are easily within the parameters required for a wide 
range of liquid-based experiments. In particular, a novel ex¬ 
periment with MFCs has been carried out, where the robot 
nurtures interactively an MFC based on its voltage. We hope 
that the unique features of this platform can form the basis 
for new lines of research in artificial life. 
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Introduction 

A key objective of artificial life research is grasping how 
non-living matter can recreate the essential properties of life 
(Hanczyc et al. 2015). An understanding of the essential 
properties of life not only may result in synthesizing artifi¬ 
cial life and self-reproduction (Bedau et al. 2010), but may 
also lead to understanding the complexity of natural life. In 
this paper we focus on movement of droplets in concentra¬ 
tion gradients, which can be viewed as a simplistic model 
of life that mimics the behavior of cells that move away 
from their metabolic products into regions with fresh nutri¬ 
ents (Hanczyc et al. 2007). 

Droplet experiments require precise positioning of 
reagents with respect to stationary or even motile droplets. 
This makes them difficult to replicate by hand because hu¬ 
mans introduce systematic errors and noise. This originates 
from the fact that humans are unable to precisely perceive 
spatial distances and dynamics. Even if they could, placing 
reagents at a known distance by hand is imprecise. Further¬ 
more, the angle of the pipette tip, the time to dispense liquid, 
the force of dispensing, and the distance between pipette tip 
and liquid surface are parameters that could potentially af¬ 
fect the experiment. In addition, over the course of long 
experiments, human exhaustion could further affect the ex¬ 
periment. 

The confluence of computer vision and robotic automa¬ 
tion makes automation of these experiments possible. Real¬ 
time analysis of experimental images provides data about 
droplet properties and behavior. The preciseness of robot 
automation enables significant reduction of noise related to 
positioning of reagents and control over parameters such as 
fixed dispensing angel, dispense time, dispensing force, and 
dispense distance to liquid surface. It is also possible to 
make reactive experiments because the data obtained from 
the computer vision system may trigger the robot to per¬ 
form a specific action. Hence, the introduction of automa¬ 
tion makes it is possible to control experimental parameters 
precisely and significantly reduce the noise leading to im¬ 
proved statistical significance of results. 

In this paper we automate an experiment whose purpose 



Figure 1: EvoBot (top), bottom camera view of decanol 
droplet on microscope slide (down). 

was to understand the response of a droplet as a function of 
distance to a reagent (Cejkova et al. 2014). Chemists did this 
experiment by hand, but given they were not able to place 
reagents precisely they had to rely on intuition and luck to 
get a sufficient coverage of distances. Whether they were 
successful or not could first be verified after the experiments 
were performed by analyzing the experiment videos. In con¬ 
trast, by employing computer vision driven automation it is 
easy to ensure systematic coverage of relevant distances and 
we were also successful in reducing the noise of the experi¬ 
ments. 

Implementation 

The robot we have developed, EvoBot shown in Figure 1 
(top), is one possible design meeting the image process¬ 
ing and robotic automation requirements to perform auto- 
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mated artificial chemical life experiments (Faina et al. 2016). 
EvoBot is designed in 3 layers. An experimental arena, to 
hold microscope slides, Petri dishes or other reaction ves¬ 
sels. On top of the experimental layer is the robot head, 
accommodating up to 11 syringe modules to absorb or dis¬ 
pense liquid. The sensor layer is under the experimental 
layer where the camera is placed. 

Various droplet data collected from the image analysis of 
the camera is used as feedback to make artificial life exper¬ 
iments possible. The camera provides data about droplet 
properties, such as position, speed, area, and color, or about 
change in droplet behavior, such as droplets merging or 
dividing, clustering or declustering. Based on this data, 
EvoBot will interact with the experiment. 

EvoBot can be used to automate a variety of artificial life 
experiments (Nejatimoharrami et al. 2016). We have used 
EvoBot to track motile droplets, and interact with the exper¬ 
iment, e.g absorbing a droplet when its speed goes below 
a threshold. We have also used EvoBot to detect when a 
collection of droplets behaves in a certain way, e.g droplets 
clustering, and accordingly interact with the experiment, e.g 
inject a reagent at a certain distance from the droplet cluster. 
Another application of EvoBot is 2D or 3D positioning of 
droplets at precise positions or in specific patterns forming 
complicated geometric shapes. 

Experiment Results 

To verify the reduction in variability of reactive artificial life 
experiments, we used EvoBot to duplicate the experimet re¬ 
ported by Cejkova et al. 2014, as shown in Figure 1 (down). 
The experiment was performed 15 times with 1 mL of 10 
mM sodium decanoate, 5/iL droplet of decanol, and 10/iL 
droplet of a 1 M NaCl solution (i.e., 10/imol of NaCl) dis¬ 
pensed at a distance of 50 mm to a decanol droplet. 

Figure 2 shows the position of a decanol droplet over time, 
its induction time, i.e., the delay between NaCl addition and 
the start of its chemotaxis, and chemotactic droplet speed. 
The results of our experiments verify the behavior of de¬ 
canol droplets in presence of salt concentration gradients 
observed by (Cejkova et al. 2014). Comparison of the re¬ 
sults obtained from pipetting by hand and the robot, shows 
a reduction in the variability of the results from the robot, in 
particular about 26% decrease in coefficient of variation for 
the induction time. 


Conclusion 

In this work, we described how we applied computer vision 
and automation to improve an artificial chemical life exper¬ 
iment. For this experiment the use of the robot enabled sys¬ 
tematic control of experimental parameters resulting in less 
noisy experimental data. Overall, we conclude that com¬ 
puter vision and robot automation make it possible to per¬ 
form chemical experiments where precise spatial placement 
and timing are important. 
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Abstract 

We demonstrated new scheme of artificial life, which conducts 
the temporal evolution of real-living-cell distribution by giving 
programmable interactions to the cells. By using optical 
interlink feedback, two groups of isolated micro-algae cells 
were artificially interacted each other. The micro-algae cells 
responded to the illumination pattern produced with artificially 
designed algorithms, leading to the autonomous evolution of 
cell distribution in micro-aquariums. Habitat domain separation 
and autonomous oscillation of cell density were realized with 
the interlink feedback. In habitat domain separation, the initial 
fluctuation of cell density distribution grew with the interlink 
feedback, accompanying clustering of high-density areas. In 
autonomous oscillation, the photo-responses of two micro-algae 
determined the period and waveform of the oscillation. 

Introduction 

Various life-based phenomena are derived from interaction 
among different living species. For an example, when two 
microbial species interact repulsively each other, the habitat 
domains of each species were spontaneously separated, 
resulting in habitat domain separation (Schloter, 2000). One 
promising way to investigate the life-based phenomena and 
their temporal evolutions is to compose a hybrid system with 
living cells and a computer system to generate external 
stimuli. The natural behavior of living cells is affected by the 
external stimuli generated by programmable manner, leading 
to the spontaneous evolution of cell distribution through the 
artificially designed interaction between the cells. 

Here we describe our artificial interlink feedback system 
with confining motile and photo-responsive micro-algae cells 
in two separated micro-aquariums. By utilizing the 
photophobic responses of the micro-algae cells, the controlled 
interaction between two groups of micro-algae cells was 
achieved; the irradiation of light patterns dynamically 
produced according to a designed algorithm evokes the 
photophobic responses of the cells. Autonomous domain 
separation and oscillation between the two micro-algae cells 
were demonstrated as the result of the artificial interlinking. 

Experiment 

One single optical feedback system was composed of a 
microscope, objective lens, video camera, data processing PC, 


LC projector, and reduction lens system. The detail of the 
system can be found in our previous report (Ozasa 2013). 
Micro-algae cells were confined in micro-aquariums, which 
had 5x5 squares of 480 pm in width, connected to 
neighboring squares with paths of 90 pm in width, as shown 
in Fig. 1. The micro-aquarium was 120 pm deep. Several 
hundreds cells of Euglena gracilis or Chlamydomonas 
reinhardtii were used for one micro-aquarium. The cell 
activity in 25 individual squares was evaluated as a set of new 
measure "trace momentum (TM)". Based on the set of TM 
values, illumination patterns were produced according to a 
designed algorithm, and projected onto the micro-aquarium. 
The illumination evoked the photophobic responses of the 
cells, and the spatial cell distribution was changed through the 
photo-responses of the cells. The feedback time step was 
approximately 1.47 s/cycle. 

The artificial interlinking between two sets of the above 
optical feedback systems was realized by exchanging the data 
sets of TMs each other. Two types of the feedback algorithms 
were used, one for habitat domain separation and one for 
autonomous oscillation. For the habitat domain separation, the 
illumination intensity for each of 25 squares was determined 
as to proportional to the normalized TM of the square 
positioned correspondingly in the counter micro-aquarium. 
For the autonomous oscillation, the 25 squares were divided 
into two groups of 12 fixed squares, and the illumination to 
the 12 squares was switched on/off all together, with a 
condition that the group TM for the corresponding group in 
the counter micro-aquarium exceeded a prefixed threshold 
ratio of the grand total TM. For simplicity, we denote one of 
the two micro-aquariums as A, and the other as B. 

Results and Discussion 

We used E. gracilis for both of micro-aquarium A and B in 
the habitat domain separation experiment. After the interlink 
feedback started, the number of squares occupied (having a 
larger normalized TM) by the micro-aquarium A was 
increased gradually, overcoming that of the micro-aquarium B 
until a time step of 1200, as shown in Fig. 1(a). After many 
transitions, the number became balanced between the micro¬ 
aquarium A and B at the end of the experiment. Figure 1(b) 
shows the final trace image obtained at a time step of 4000. 
The squares with a high cell density were clustering into a 
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larger domain. A few squares with a high cell density 
remained un-clustered, probably because the surrounding 
squares were taken by the counter micro-aquarium. 



Figure 1: Trace image of micro-aquarium A and B observed at 
a time step of 1200 (a, top) and 4000 (b, bottom). The traces 
of swimming cells were displayed by red lines. Illumination 
intensity for each square was super imposed as blue colors. 

The habitat domain separation realized in the experiments 
was caused by the enhancement of initial fluctuation of cell 
density among the squares, and by the clustering of the 
squares of higher cell densities. The result in this experiment 
can be considered as artificial community formation among 
the real living cells. Many types of artificial clustering can be 
produced in the time course of the experiment, by modifying 
the interlink feedback algorithm. 

For the autonomous oscillation experiments, C. reinhardtii 
cells were confined in the micro-aquarium A, whereas E 
gracilis cells in B. The photophobic response of E. gracilis 
cells is to escape from the light, whereas that of C. reinhardtii 
is to be activated by the light. 

Figure 2 shows the temporal change of cell distribution and 
light illumination for each micro-aquarium, observed with a 
prefixed threshold value R = 0.3. The cell distribution is 
represented by A1 - A2 or B1 - B2 in Fig. 2, which is the TM 
difference between illuminated and un-illuminated square 
group. An autonomous oscillation with a period of approx. 
5.9 min was observed, showing that the natural photo¬ 
response of each micro-algae generated the cell density 
oscillation via artificially produced interlinking. The cells of 
E. gracilis migrate out from the illuminated area, whereas 
those of C. reinhardtii are activated in the illuminated area. 
This difference in survival strategy between two micro-algae 
species determined the period and waveform of the 
oscillation. When the prefixed threshold value R was reduced 
to 0.1, the oscillation period became approx. 1.9 min, 
indicating that the oscillation can be tuned by the parameter. 

The new scheme of artificial life demonstrated here can 
also be used for soft-computing by employing artificial optical 


stimulation with light patterns. The natural cell behavior 
based on their survival strategy will affect the soft-computing 
process, which will be useful for solving combinatorial 
optimization problems or network flow problems. 



Figure 2: Autonomous oscillation observed with interlink 
feedback. TM differences between two groups of 12 squares 
were plotted for the micro-aquarium A with C. reinhardtii 
(upper) and B with E. gracilis (lower), together with 
illumination switching On/Off. TM differences were vertically 
shifted for easy-to-see. The interlink feedback was active for 
time steps of 500-3500. The prefixed threshold level was 30% 
in this experiment. 

Conclusion 

A new scheme of artificial life has been demonstrated, which 
conducts temporal evolution of real-living-cell distribution by 
giving programmable interactions to the cells. The habitat 
domain separation in nature was reproduced with the scheme 
with using two isolated groups of E. gracilis in the micro¬ 
aquariums. Two cell groups were correlated with an 
artificially designed algorithm to avoid a high cell density area 
in the counter micro-aquarium through the designed 
algorithm. The habitat domain separation was gradually 
developed as the reversal illumination patterns, starting with 
the initial cell density fluctuation, accompanying the 
clustering of the un-illuminated squares with higher cell 
densities. We also demonstrated autonomous oscillation of 
cell density for two different species of micro-algae, E. 
gracilis and C. reinhardtii. The oscillation period and 
waveform were determined by the difference of photo¬ 
response between the two micro-algae. The study shows that 
various interactions between living cells can be realized with 
artificially designed interlink feedback algorithms, which will 
contribute to developing artificially interlinked life systems 
and soft-computing with living cells. 
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Abstract 

Microscopic crowd simulation usually uses ad-hoc models. 
While these have been proven to be useful, they are diffi¬ 
cult to calibrate and do not always reflect real behaviour. For 
this reason we propose a machine learning approach using 
neural networks. The main contribution of the project is a 
first exploration of prediction of agent trajectories using two 
specific types of neural networks, Support Vector Machine 
(SVM) and Spiking Neural Networks (SNN). 

Introduction 

Developing a computing model for simulating the behaviour 
of large groups allows us to study the movement of people 
and improve buildings, plan logistics in health, retail and 
other services, or even void traffic jams. Also, these models 
allow us to reduce risks during natural disasters and medical 
emergencies. 

There are two ways in which crowd simulation is usu¬ 
ally carried out: macroscopic and microscopic Rivas et al. 
(2014). Macroscopic simulation describes global interac¬ 
tions with the environment and the crowd itself. In con¬ 
trast, microscopic simulation exposes the interactions be¬ 
tween individuals within a group; each agent is processed 
individually to simulate a crowd. In particular, Agent- 
Based Models (ABM) simulate the actions and interactions 
of autonomous agents that generate global-scale behaviours 
Bonabeau (2002). One of the behaviors of such autonomous 
agents is to decide where each will move next based on their 
perception of their environment. For this purpose, the usual 
methods in microscopic crowd simulation are ad-hoc models 
such as steering rules, social forces, geometric methods such 
as ORCA or even approaches based on synthetic vision. 

All these have been proven to be useful, but they are diffi¬ 
cult to calibrate to reflect real behaviour. For this reason we 
propose using real data and a machine learning approach. 
We will be using trained neural networks to make decisions 
as to what the next step a simulated agent should take, in 
effect using a prediction of their trajectories. In the follow¬ 
ing sections we explore some background: crowd behavior 
models and artificial neural networks (in particular spiking 


neural networks, SNN, and support vector machine, SVM, 
which are the ones we will use). 

On the other hand, in previous work we have proved the 
efficiency of third generation neural networks (SNN and 
SVM) over other Artificial Neural Networks (ANN) in or¬ 
der to solve classification problems for separable and non 
separable data. In fact the processing speed for training and 
execution is better when these methods are parallelized, Paz 
et al. (2014). This is the main reason why in that article 
we selected both methods for use in classification. Now, 
the most popular methods used for obstacle avoidance and 
to predict trajectories have been tried by Huang et al. (2016) 
and Abbeel et al. (2008); the evaluation of local costs is con¬ 
sidered in these papers to get the less computationally ex¬ 
pensive trajectory A*. Natural movement of crowds cannot 
always be obtained by simply evaluating costs locally, al¬ 
though for single agents it could work. Real people can see 
further than what we can cheaply evaluate at during simula¬ 
tion. This represents the main advantage of using real data 
to train ANNs instead of locally evaluating ad-hoc models. 
Our main contribution is to show that we obtain trajectories 
that avoid obstacles and produce what seems to be a more 
natural movement by using a prediction system that applies 
third generation neural networks to characters in crowds and 
to individuals. That is, a method that can use the knowledge 
embedded in the trajectories followed by real people, that is 
not explicitly present when taking decisions by evaluating 
costs locally. 

In the rest of this paper we will explore trajectory pre¬ 
diction, perform some experiments, obtain some results and 
come to some conclusions. 

Background 

Reynolds (1987) proposed three basic rules for the behav¬ 
ior of members of crowds, which are: separation, align¬ 
ment, and cohesion. These remarkably simple rules main¬ 
tain together a group of boids, give them a direction of 
movement and keep them free of collisions. Another impor¬ 
tant approach is social forces Helbing and Molnar (1995), 
where agent behavior is based on a collection of forces, 
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called social forces. These can be attraction or repulsion. 
In its simplest form, a pedestrian can use these forces to 
get to its destination and avoid obstacles or other pedestri¬ 
ans. In predictive/Velocity-based models Paris et al. (2007), 
agents calculate the velocities necessary to avoid collisions. 
Within these velocities they can choose how to move to their 
goal avoiding collisions. This concept is expanded by Van 
Den Berg et al. (2011); van den Berg et al. (2008), who in¬ 
troduce the notions of Velocity Obstacles (VO) and Recip¬ 
rocal Velocity Obstacles (RVO) and the notion of Optimal 
Reciprocal Collision Avoidance (ORCA). Lastly, synthetic 
vision Ondrej et al. (2010); Moussaid et al. (2011) is based 
on using visual information obtained from the perception of 
the environment to get a safe trajectory to the agents goal 
without collision. 

All these models predict individual trajectories and col¬ 
lective patterns of motion and can have a relatively good 
quantitative agreement with a large variety of empirical and 
experimental data. A deep study of this topic is beyond the 
scope of this section, thus we recommend our previous work 
Rivalcoba and Ruiz (2013). In previous work Rivas et al. 
(2014) a group of real people were coupled with virtual hu¬ 
mans to get a plausible reaction to real people. The simula¬ 
tion of virtual humans was based on social forces. However, 
in this paper we want the agents to exhibit behavior learned 
from real examples, since this should lead to a more natu¬ 
ral behavior without the tedious process of hand calibration 
of rules or forces. There is some work in this direction, for 
example Rodriguez et al. (2011) presents a crowd analysis 
algorithm by using behavior priors that are learned from a 
database of crowd videos gathered from the Internet. The 
algorithm learned a set of crowd behaviour priors off-line. 
Behaviour is compared in order to validate their data-driven 
crowd model. One strategy to reduce the complexity is look¬ 
ing in a local space. They adopt a linear Kalman filter to 
evaluate tracking. Error of trajectory is presented, however 
it is important to know the efficiency of used method, which 
is not discussed. 

Wang et al. (2016) proposes a new approach based on 
finding path patterns in both real and simulated data in order 
to analyse and compare them, by using unsupervised clus¬ 
tering by non-parametric Bayesian inference. They offer to 
take both the global and local properties of crowd motion 
into account for analysis of the data. In this case, author 
considered three parameters, state of the agents (position 
and orientation), state of space, a probability over the path 
and pattern path. They simulate 64 agents with obstacles and 
one important contribution is that they got several patterns of 
path crowds in some environments. However, their method 
does not directly measure individual trajectories thus does 
not reflect individual visual similarities. In our case, we will 
measure individual trajectories and measure collisions with 
speed, direction, goal and occupation. We believe that ANN 
is a less complex method and can achieve good results. 


Sujeong Kim and Manocha (2016) presents an algorithm 
that combines realistic trajectory behaviors from videos for 
simulations statistical techniques to compute movement pat¬ 
terns and motion dynamics from noisy 2D trajectories ex¬ 
tracted from crowd videos in order to generate realistic 
crowd movements performing tasks. The main limitation 
in this approach is that it may not work well if the layout 
of the obstacles in the virtual environment is different from 
captured in the original video. Sujeong Kim and Manocha 
(2015) applies interactive techniques for analyzing crowd 
videos combining online tracking algorithms from computer 
vision, non-linear pedestrian motion models from computer 
graphics, and machine learning techniques. Also, they use 
Bayesian inferencing technique to compute the trajectory 
behavior feature for each agent. In this case they do not 
require learning a dataset. Lee et al. (2007) is focused on 
learning an agent model that controls the motion of each 
agent in a crowd, what is based on a locally weighted lin¬ 
ear regression. Their model can be learned to imitate the 
rule-based flocking or insects. Also, they uses attraction to 
keeps the local formation of agents. 

Artificial Neural Networks 

An Artificial Neural network is a system composed of sim¬ 
plified abstractions of neurons that are used to solve compu¬ 
tational problems by imitating the way neurons are fired or 
activated in the brain, in which many neurons work in par¬ 
allel to produce a result. There are three ways a neural net¬ 
work can learn: Supervised learning, Unsupervised learning 
and Reinforcement learning. These methods all work by ei¬ 
ther minimizing or maximizing a cost function, but differ on 
the way this cost function is defined. In supervised learning, 
example inputs and the correct output are used to train the 
network. Unsupervised learning only uses inputs, and the 
network figures out relationships or categories. A reinforce¬ 
ment learning neural network learns from examples of ac¬ 
tions and by evaluating their cost and assigning rewards and 
penalties. Throughout their development, ANN’s have been 
evolving towards more powerful and more biologically real¬ 
istic models. In the last decade, the third generation Spik¬ 
ing Neural Networks (SNN’s) have been developed which 
comprise of spiking neurons. Information transfer in these 
neurons models the information transfer in biological neu¬ 
rons, i.e., via the precise timing of spikes or a sequence 
of spikes. Addition of the temporal dimension for infor¬ 
mation encoding in SNNs yields new insight into the dy¬ 
namics of the human brain and has the potential to result in 
compact representations of large neural networks. As such, 
SNN’s have great potential for solving complicated time- 
dependent pattern recognition problems defined by time se¬ 
ries because of their inherent dynamic representation. The 
two important methods that we considered to predict the 
trajectory are Spiking Neural Networks (SNN), mentioned 
above, and Support Vector Machines (SVM). In both cases 
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they are highly parallelizable by using GPUs, Tabarez-Paz 
et al. (2013). 

Comparison between SNN and SVM 

Respect to advantages of SVM, it finds the Global Mini¬ 
mum Value, while ANN finds the Local Minimum Value 
for separable and non-separable data Burges (1998), Cortes 
and Vapnik (1995). The VC dimension of SVM is infinite, 
however its computational complexity is high. According 
to Yang et al. (2009) SVM has 0(n3) computational com¬ 
plexity and 0(n 2) memory complexity in the training phase, 
where n is the training size. Furthermore, the number of 
support vectors, that is equal to quantity of neurons in the 
hidden layer, and it grows linearly with n and the compu¬ 
tational complexity is O(n), where the kernels are given by 
the training inputs Horvath (2003). This implies a limitation 
for real-world applications, whose training size is typically 
far beyond thousands. However, the complexity of the net¬ 
work architecture is independent of the dimension of the hy¬ 
perplane. Although back-propagation converges to the Min¬ 
imum Local Value, the solution is related to the number of 
hidden layers and number of neurons in the hidden layers. 
For Suykens et al. (2002), SVM solutions are characterized 
by convex optimization problems, up to the determination 
of a few additional tuning parameters. In contrast, SNN for 
discontinuous time, its computational complexity depends 
on discrete time t(n), the time complexity is 0(t(n), where 
n is the discrete time. 

In the case of continuous time, the computational com¬ 
plexity is logarithmic as is highlighted by Maass (1995) in 
theorem 1. 

Theorem 1 The VC-dimension and pseudo-dimension of 
any SNN N with piecewise linear response-and threshold- 
functions, arbitrary real-valued parameters and time- 
dependent weights can be bounded (even for real valued in¬ 
puts and outputs) by 

0(\E\-\W\.S(log\V\+logS)) 

if N uses in each computation at most S spikes where V is 
the number of neurons, S is the number of synaptic connec¬ 
tions, and VC — dimension is for computations with up to 
S spikes as large as U(|£j • S) 

In the training process data needs to be codified in the time 
dimension, Ghosh-Dastidar and Adeli (2009), so it does not 
depend on whether the database has one or more types of 
data (multiclass). Nevertheless, there are many applications 
that have used SVMs, such as classification database, pre¬ 
diction, pattern recognition and regression. 

Performance for SVMs is better than typical ANNs in 
large databases for separable and non-separable data. There 
are important differences between these typical ANNs and 
SVM: 


General view of our approach 

This work focuses on the decision making process of virtual 
agents that must pick a position to go to. These agents will 
consider their situation and use a trained ANN to predict this 
next position. The learning process uses data obtained from 
trajectories of real pedestrians, extracted from video. From 
these trajectories, the 2D positions of all pedestrians of the 
crowd in any given frame are obtained, and from them we 
obtained what we call an occupation code, in other words, 
the trajectory database provides specific information related 
to the position of each person in a given frame. From this 
data we know the next position of each person and all people 
nearby. In this way we obtain the occupation code of each 
agent in each frame. 

With this data, and given an objective and a position we 
train an ANN that is then used for simulation. The simulator 
uses OpenGL and CUDA 7.0 and was tested in a workstation 
containing both a Tesla K40c and a GeForce GTX TITAN. 

The main parameters that we must take into account is 
the position and velocity of the nearby pedestrians, how near 
these pedestrians are, the final objective, and the density of 
the crowd. Our main motivation is getting a trajectory that is 
more natural in a crowd simulation by using the trained neu¬ 
ral network to predict the next step of each character instead 
of evaluating costs by using only local information. 

The trajectories were obtained from third person top view 
videos of the crowd such as those available in Lerner et al. 
(2009). These consist of control points of Catmull Rom 
splines of said trajectories. 

A block diagram of the system can be seen in figure 1. In 
other words, Catmull Rom splines representing trajectories 
are obtained from the data collection and their control points 
ordered according to frame number. Since not all characters 
have control points that appear in the same frame, in order 
to get the occupation code for all agents, we have to inter¬ 
polate the position of all the agents for all the frames. After 
that, we obtain a large table that contains position, direction, 
speed, frame, sense and goal. Finally, in another algorithm 
we apply the data calculated in the interpolation to compute 
the occupation code for each agent. However, repetitive oc¬ 
cupation codes are obtained and the table must be simplified. 
Finally this data will be used to train the ANN. 

Simulating crowd behaviour using Artificial 
Neural Networks 

We have chosen to use SNN’s due to their characteristics for 
training multiclass data with various attributes per instance 
as well as for its acceptable efficiency, although we also con¬ 
sidered SVM for prediction in order to compare SVM re¬ 
sults. We highlighted the term Multiclass for a database if 
data can be classified in more than one type according to a 
common characteristic. For example in Weka3, there is a 
free database that can be downloaded in order to test some 
classification algorithms for prediction or clustering, an Iris 
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Figure 1: Block Diagram of System. 


data set is classified in three categories of flowers because of 
some common physical characteristics Bifet et al. (2015). 

In our case, we have seven classes that are the seven di¬ 
rections of the agents and our dataset is the occupation code, 
with a size proportional to the variety of the cases collected 
from the trajectories in the videos. This data depends on 
some variables such as the speed of the agent, the distribu¬ 
tion of other agents in the viewing area, and the density of 
the crowd. As mentioned above, we use trajectories obtained 
from videos of pedestrians in a crowd, and we interpolate 
the data to get positions per frame of all agents, and from 
these calculate the direction of movement of the agents in 
the previous and following frames, as well as their speed. 
The process consists of codifying the occupation patterns 
around each element in the crowd. The videos and trajecto¬ 
ries are those used by Lerner et al. (2009) and can be seen in 
figure 2. 

First experiment: Results using SNN 

In our first experiment we used a set consisting of the occu¬ 
pation code and the goal of every agent as a row in the input 
matrix (M) of the SNN. After that, the output of the ANN 
is the decision that was taken for the next step of the pedes¬ 
trian. The occupation code was computed using a radius of 
agent and an angle of view divided in seven sectors, as can 
be seen in figure 3, each of which represents one bit of the 
occupation code. We define the radius of view as the max¬ 
imal distance that will be explored in order to decide that a 
sector of circle of the respective bit is occupied. 

The set of seven bits form an occupation pattern. If there 
is a pedestrian is in this sector, then the sector is occupied 
and the bit is one but if the sector is empty the bit is zero, 
such as in figure figure 4. 

Results of this experiment using SNN with the codifi¬ 
cation explained above is presented in figure 5, so we can 
check two simulation tests: in the first one (Test 1), agents 
start all at random positions the same line at the bottom, and 
have as their goal a single point in the top: we see their trace 
from start to finish. In (Test 2) the other tests all agents lie 
on a circle, and each has as its goal the point exactly on the 
opposing side of the circle, usually called circle test. Both 
simulations used 100 agents, and by the time the agents had 


reached their goal Test 1 had 396 and Test 2 had 72 col¬ 
lisions, respectively; most of thee collisions were concen¬ 
trated in a very few agents. 

We performed other testing; 100 agents whose position 
is divided into the four sides of a square that (Test 3). Ini¬ 
tially they are randomly distributed on the four sides of a 
square, and have a randomly assigned goal on the line op¬ 
posite to their initial position. In (Test 4) agents are initially 
distributed on a grid. In each case the goal has the same x 
coordinate but the y coordinate is on the opposite side. Test 
3 had 17 collisions and Test 4 had 0 collisions. 

In order to take the distance of collision into account and 
to reduce this average improving precision, we tried using a 
different coding using three radii, as is described in the next 
section. 

Second experiment: Results using SNN and SVM 

After obtaining the first results we observed that the sim¬ 
ulated characters detected collisions and avoided them but 
that their reaction was not very natural: in particular when 
crowd density increases. We therefore proceeded to use an 
expanded coding which consists of dividing the sectors by 
three different arcs, with three different radius , as we can 
see in figure 6, still dividing each arc in seven sectors. 

To code we took into account the distance from the cen¬ 
ter of the semicircle to the agent in the three radii, i.e. the 
number of bits, where it had 1, this number is replaced by its 
distance from the centre with three options, so, the number 
1 will replaced by 2 if its distance is between | Radius and 
Radius ; or will be replaced by 1 if its distance is between 
| and | Radius ; or will be replaced by 0 if its distance is 
between 0 and \ Radius, such as in figure 6. This code rep¬ 
resents the empty spaces, or steps, that agents could move 
toward its goal according to the predicted step. 

Additionally we controlled the rotation per agent. In other 
words, after getting the predicted step, the agent rotates to¬ 
ward the goal. Also, agent speed is coded using the seven 
digits, for example if the speed of nearest intruder is big¬ 
ger than that of the main agent, that speed bit will be 3 and 
the agent can consider that space as empty for the next step; 
also if the speed of nearest intruder is the same that of the 
main agent, that speed bit will be 2 and also this space could 
be considered empty; but if the speed of nearest intruder is 
smaller than that of the main agent, then that speed bit will 
be 1 and the agent could not consider that empty for the next 
step. 

Once the SNN has been trained the simulation is carried 
out as explained before: given an occupation pattern of the 
agents and the objective vector the next position is deter¬ 
mined by the output of the trained neural net. 

In case of some sector being occupied but with velocity 
component greater or equal in the direction of that of the 
agent being analysed, the sector is considered empty, fig¬ 
ure 7 . 
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Figure 2: Images from sample videos. Trajectories derived from these videos were used as input data. 



Figure 3: Proposal 1: Codifying the occupation pattern. 
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Figure 7: Proposal 2: Cases of codifying. 
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Figure 8: Test 5, 6 



Figure 9: Test 9, 10 SVM: Finishing in the opposite goal 
with 100 agents 


Second experiment: Results using SNN In the follow¬ 
ing paragraphs we present some results of experiments made 
with SNN using the second approach. In figure 8, test 5, 
agents start in the opposite place of the goal, with 27 col¬ 
lisions and deviation standard of 0.93; in test 6 the circle 
test was tried with 49 collisions with deviation standard of 
1.6. That means that the most of agents not contributed with 
collision. We used 100 agents with SNN. 

We conducted other tests: (Test 7) is like Test 3, and Test 
8 is like Test 4. In this case we had only 4 and 0 collisions 
respectively. With respect to precision, this was not better 
than first approach, but most of agents got less than 1 unit of 
distance to the goal and there were fewer collisions. 

We got better results with this approach than the first. So, 
it is important to compare with SVM with this approach, as 
shown in the next section. 

Second experiment: Results using SVM The same ex¬ 
periments were tested, using this second coding approach, 
but using SVM to test the advantages or disadvantages of 
SVM respect to SNN, such as the ability to find the global 
minimal value and faster execution of prediction. In Figure 9 
respect to collisions we got 227, with 227 agents, and the test 
circle 10 we got 779, with 100 agents. And the standard de¬ 
viation was 8.79 and 15.16, respectively, significantly higher 
than with SNN. Probably we could get better results with 
other methodologies derived from SVM such us SVM light 
or SMO Joachims (1999). This means that collisions per 
agent are more with SVMs than with SNNs. 


We also conducted (Test 11) analogous to Test3 and Test 
12) analogous to Test 4 Collisions measured were 29 with a 
standard deviation of 0.52 and 157 collisions with a standard 
deviation of 1.02, respectively. 

In figure 10 we see an example of our simulation using 
our visualization system for diverse animated crowds where 
by combining geometry from different characters and us¬ 
ing shaders, we generate varied characters using few assets. 
Also, different discrete levels of detail are used to allow the 
system to render crowds composed up to a quarter million 
characters. We suggest as a reference, Hierarchical level of 
detail for varied animated crowds Hernandez and Rudomin 
(2011). Likewise, techniques for generating many varied 
and animatable characters for crowdswith reduced memory 
requirements, and where the use of a texture space tech¬ 
niques allows painless animation transfer between different 
characters and for different levels of detail Ruiz et al. (2013). 

Simulation speed of SVM is better than for SNN. Dis¬ 
tance to the goal from the final position reached in simu¬ 
lation is also less with SVM than with SNN. But the tra¬ 
jectories obtained with SNN were better in avoiding obsta¬ 
cles and seemed to give a more natural-looking behaviour 
of the agents. Also, as we can see in previous work Paz 
et al. (2013), with respect to training time, SVM is faster 
than SNN. However, training is still much slower than sim¬ 
ulation, so the design of an algorithm that learns faster, or 
even in real time, requires parallelization of training. 

Conclusions 

According to experiments, SNN has better performance for 
behavior of crowds than SVM at predicting trajectories, in 
the statistics, SNN got its best results in the second approach 
achieving significantly fewer collisions. On the other hand, 
all agents arrived to their goal with SVM, while the distance 
to the goal with SNN was less than 1 in most cases, so it is 
still possible to achieve better speed and precision with SNN 
in future work. 

Since our model takes into account those limitations by al¬ 
lowing only decisions included in the trajectories contained 
in our database Lerner et al. (2009), which consists basically 
of moving characters avoiding each other, this implies that 
our model works rather well in that characters avoid colli¬ 
sions while getting to the goals. However we did not con¬ 
sider large static obstacles, and in preliminary testing, as ex¬ 
pected, the system as presented has limitations in this case. 

Another limitation is that precision depends on efficiency 
of the ANN used; in Sujeong Kim and Manocha (2015) ex¬ 
periments do not work with very dense crowds, and in our 
case we have only tried up to 255 agents. 

Finally, some advantages of simulating with ANN are that 
carrying out mathematical computations, using any physical 
or mechanical data , or guessing what were the appropriate 
parameter settings was not necessary. However the method 
has results that depend very strongly on the trajectory data 
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Figure 10: Simulation a), b) Circle Test SNN, c), d)Test SYM finishing in the opposite goal objetive with 100 agents. 


used fort training, and it should be possible to get more nat¬ 
ural movement if the database is larger and contains more 
non repetitive cases. 

With respect to behaviour shown in the test, we can see 
that the trajectories have more natural shapes and form geo¬ 
metric patterns that look like those of real crowds. As future 
work, it is important to compare this methodology with the 
common ad-hoc methods and to train for tests that include 
large obstacles. 
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Abstract 

Volunteer computing is a form of distributed computing 
where users decide on their participation and the amount of 
time and other resources they will “lend”. This makes them 
an essential part of the algorithm and of the performance of 
the whole system. As a socio-technical system, this participa¬ 
tion follows some patterns and in this paper we examine the 
result of several volunteer distributed evolutionary computa¬ 
tion experiments and try to find out what those patterns are 
and what makes an experiment successful or not, including 
the feedback loop that is created between the users and the 
algorithm itself. 

Introduction 

Some time ago, our group faced the problem of diminish¬ 
ing funds for buying new hardware. This was aggravated 
by the increasing maintenance costs and extended downtime 
resulting from the continuous failures of existing clusters. 
Considering this, we leveraged our experience in the design 
of web applications with JavaScript and other volunteer and 
unconventional distributed evolutionary computing systems 
to design and release a new free framework that would allow 
anyone to create a volunteer distributed evolutionary com¬ 
putation (EC) experiment using cloud resources as servers 
and browsers as clients. This framework was called NodIO 
(Merelo et al., 2016). NodIO provides server infrastructure 
for volunteer-based distributed evolutionary computing ex¬ 
periments by providing a chromosome pool. This pool is 
used by clients in browsers and any other using the applica¬ 
tion programming interface (API) to put chromosomes and 
retrieve them, working then as a loose, asynchronous and ad 
hoc connection among all clients using it. 

This loose connection provides a low-overhead way to 
connect desktop experiments with volunteer-based ones, 
with all of them contributing to the pool, but every one of 
them working as separate island carrying out their own evo¬ 
lutionary algorithms. This is why NodIO is proposed mainly 
as a complement to existing resources such as desktop sys¬ 
tems or laptops. As long as it provides a non-null compu¬ 
tational capability that can help existing resources find the 
solution faster it will have found its purpose. The main 


use case is someone setting up NodIO in the cloud, writ¬ 
ing a fitness evaluation function and running a client from 
his or her own computer, but requesting help in social net¬ 
works for additional resources. That is why, in this system, 
we have considered the whole social aspect in the design, 
with issues related to security, trust and privacy among oth¬ 
ers. The computing system becomes a socio-technical sys¬ 
tem (Vespignani et al., 2009). In this paper, we are going 
to focus on measuring the response of users to experiments, 
that is, the time they spend running it, but at the same time 
we will also focus on the technical aspects of the server and 
how these might change the behavior of users, improving 
the capability of the system. 

Our research group is committed to open science, and we 
think this is a very important part of the techno-social sys¬ 
tem. By being transparent, incentives to cheat are reduced 
and, in fact, we have detected no issue for the time being. 
Next we present the state of the art in web-based volunteer 
computing systems along with attempts to predict and model 
its behavior. 

State of the art 

Volunteer computing involves a user running a program vol¬ 
untarily and, as such, has been deployed in many different 
ways from the beginning of the Internet, starting with the 
SETI@home framework for processing extraterrestrial sig¬ 
nals (Anderson et al., 2002). However the dual introduction 
of JavaScript as a universal language for the browser and the 
browser as an ubiquitous web and Internet client has made 
this combination the most popular for volunteer computing 
frameworks such as the one we are using here, and whose 
first version was described in (Merelo-Guervos and Garcia- 
Sanchez, 2015). 

JavaScript can be used for either unwitting (Klein and 
Spector, 2007; Boldrin et al., 2007) or volunteer (Langdon, 
2005; Merelo et al., 2007) distributed evolutionary compu¬ 
tation and it has been used ever since by several authors, in¬ 
cluding more recent efforts (Desell et al., 2008; Duda and 
Dlubacz, 2013; Gonzalez et al., 2008). Many other re¬ 
searchers have used Java (Chong and Langdon, 1999); oth- 
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ers have embraced peer to peer systems (Jin et al., 2006; 
Wang and Xu, 2008; Merelo-Guervos et al., 2012). These 
computing platforms avoid single points of failure, the 
server, and once installed need no effort to gather new users 
for an experiment, but the cost of acquiring new users is high 
since they need to be set up. 

The number of users is key in the performance of these 
systems, but it also essential to adapt the algorithm itself 
to the available resources, as shown by (Milani, 2004), al¬ 
though EAs can be readily distributed via population split¬ 
ting or by farming out the evaluation to all the nodes 
available. However, user churn affects experiment per¬ 
formance (Gonzalez Lombrana et al., 2010; Nogueras and 
Cotta, 2015) and also the performance of the algorithm itself 
(Laredo et al., 2014). All these issues imply that a the per¬ 
formance of a volunteer system cannot be measured without 
first understanding its dynamics. Initial work was done for 
peer to peer systems by Stutzbach et al. (Stutzbach and Re- 
jaie, 2006) and extended to volunteer computing by Laredo 
et al. (Laredo et al., 2008a,b). A similar study was per¬ 
formed by Martnez et al. on the Capataz system (Martinez 
and Val, 2015); however, in this case the number of com¬ 
puters used was known in advance and the main focus was 
on measuring the speed up and how job bundling helped to 
reduce overhead and enhance performance. On the contrary, 
in this paper, we will use actual volunteers. 

Some of the essential metrics in volunteer computing like 
the number of users or the time spent by every one in the 
computation in browser-based volunteer computing experi¬ 
ments, have only been studied in a limited way in (Laredo 
et al., 2014) on the basis of a single run. Studies using 
volunteer computing platforms such as SETI@home (Javadi 
et al., 2009; Merelo et al., 2008) found out that the Weibull, 
log-normal and Gamma distribution modeled quite well the 
availability of resources in several clusters of that frame¬ 
work; the shape of those distributions is a skewed bell with 
more resources in the low areas than in the high areas: there 
are many users that give a small amount of cycles, while 
there are just a few that give many cycles. 

As far as we know, this paper presents one of the few ex¬ 
periments that measure the performance of a socio-technical 
metacomputer, that is, a spontaneously created parallel com¬ 
puter that uses social networks for operations such as gather¬ 
ing new users. Apolonia et al. (Apolonia et al., 2012) used 
the Lacebook protocol to distribute tasks among the walls 
of friends, explicitly using the social network for comput¬ 
ing. However, it stopped short of relating performance to 
the macro measures of the users’ social networks. As in the 
previous example, a social network was used to get new net¬ 
work nodes; in the previous case a web page was used, while 
Lacebook’s wall was used here. 

In our case, social networks are an integral part of the sys¬ 
tem and used to spontaneously obtain users. The algorithms 
used, as well as the methodology for gathering resources 


will be described next, together with the results obtained in 
this initial setup. 

Description of the framework 

In general, a distributed volunteer-based evolutionary com¬ 
putation system based on the browser is simply a client- 
server system whose client is, or it can be, embedded in the 
browser via JavaScript. Since JavaScript is the only lan¬ 
guage that is present across all browsers, the choice was 
quite clear. We should emphasize that NodIO is more in¬ 
tended as an auxiliary computing engine, more than the main 
one, so performance of JavaScript as a language is not so 
important; even so, we have made a comparison between 
JavaScript and other languages (Merelo et al., 2015) that 
shows that the performance of JavaScript is comparable to 
other interpreted languages; compiled languages would be 
faster, but, of course, it is impossible to gather volunteers 
spontaneously and without any installation with them. 

In this sense, in this paper we propose the NodIO frame¬ 
work, a cloud or bare metal based volunteer evolutionary 
computing system derived from the NodEO library, whose 
architecture has been developed using JavaScript on the 
client as well as the server. All parts of the framework 
are free and available with a free license from https : 
//github.com/JJ/splash-volunteer. 

Thus, NodIO architecture has two tiers: 

1. A REST server, that is, a server that includes several 
routes that can be called for storing and retrieving infor¬ 
mation (the ‘CRUD” cycle: create, request, update, and 
delete) from the server. A JSON data format is used for 
the communication between clients and the server. There 
are two kinds of information: problem based, that is, re¬ 
lated to the evolutionary algorithm such as PUTing a chro¬ 
mosome in or GETing a random chromosome from it, and 
information related to the performance and state of the 
experiments. It also performs logging duties, but they are 
basically a very lightweight and high performance data 
storage (Merelo, 2015). The server has the capability to 
run a single experiment, storing the chromosomes in a 
key-value store that is reset when the solution is found. 
This store can hold every chromosome in a particular ex¬ 
periment, or have a finite size that erases the oldest chro¬ 
mosomes once it has filled to capacity, acting as a cache. 
In this paper we will test both implementations. 

2. A client that includes the evolutionary algorithm as 
JavaScript code embedded in a web page that displays 
graphs, some additional links, and information on the ex¬ 
periment. This code runs an evolutionary algorithm island 
that starts with a random population, then after every 100 
generations, it sends the best individual back to the server 
(via a PUT request), and then requests a random individ¬ 
ual back from the server (via a GET request). We have 
kept the number of generations between migrations fixed 
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Figure 1: Screenshot showing the fitness and number of 
users as seen by volunteers. The one on the left shows fit¬ 
ness, on the right the accumulated number of users. 


Modeling the performance of a 
volunteer-based distributed computer 


Table 1: Experiment table, with summary of results. 


Experiment 

#Runs 

Different IPs 

Traps 

April 4th 4/4 

57 

191 

40 

April 24th 4/24 

231 

559 

40 

July 31th 7/31 

97 

179 

40 

February, cache=128 

61 

75 

50 

February, cache=64 

61 

220 

50 

February, cache=32 

39 

86 

50 


Other clients 



CLIENT 


Random individual SERVER 


Figure 2: Description of the proposed system. Clients exe¬ 
cute a JavaScript EA in the browser, which, every 100 gener¬ 
ations, sends the best individual and receives a random one 
back from the server. 


since it is a way of finding out how much real work every 
client has done. 

Figure 2 describes the general system architecture and 
algorithm behavior. Different web technologies, such as 
JQuery and Chart. js have been used to build the user 
interface elements of the framework, a part of which is 
shown in Figure 1 and should be running in http:// 
nodio-jmerelo.rheloud.com. 

JavaScript is a functional language, so in order to work 
with a problem, a fitness function must be supplied at the 
creation of the algorithm object, called Classic. In this 
case the classical Trap function (Ackley, 1987) has been 
used. Depending on the problem additional handing func¬ 
tions and configurations can be supplied. 

The next Section will describe experiments performed to 
establish a baseline performance and gather initial perfor¬ 
mance results. In the first set of experiments, performed last 
year, we used the 40-trap problem, while the current experi¬ 
ments changed to the more difficult 50-trap in order to com¬ 
pare the performance with a more computationally intensive 
problem. 


Initial experiments were set up using the OpenShift PaaS 
www. openshift. com, which provides a free tier, mak¬ 
ing the whole experiment cost equal to $0.00. Experiments 
were announced through a series of posts on Twitter and, 
in the latest case, Telegram, and results were published 
in(Merelo-Guervos and Garcia-Sanchez, 2015). For the pur¬ 
pose of this paper, we repeated the announcement several 
times through the month of February of the next year. All 
in all, we have the set of runs with the characteristics shown 
in Table 1. In general, every experiment took several days. 
No particular care was taken about the time of the announce¬ 
ment or the particular wording. Every experiment consisted 
in running until the solution of the 50-trap problem was 
found. When the correct solution was sent to the server, 
the counter was updated and the pool of solutions resets to 
the void set. There was no special intention to wait until all 
clients had finished, thus it might happen that. In fact, the 
islands running in the browser spill from one experiment to 
the next. If an individual that is close or even at optimum 
value is stored in the cache the next experiment will be af¬ 
fected, and it will require less time to finish. This problem 
has been addressed in later versions of the system, however, 
time to solution is not the important measure here, where we 
are concerned with the computational power of the system 
established by the number of volunteers. 

The table 1 shows that every experiment included more 
than 30 runs. The number of different IPs intervening in 
them varied from more than one hundred to more than five 
hundred in the second experiment, with a number around 50 
in the second, and most recent, batch of experiments. 

A summary of the results of each run is also shown in Ta¬ 
ble 2, which shows the median number of IPs intervening 
in each experiment, median time needed to finish the ex¬ 
periment, median number of HTTP PUTS per IP. The first 
striking result is that in all cases, 50% of the experiments in¬ 
volved 5 or less IPs. This is consistent with previous results 
(Merelo-Guervos and Garcia-Sanchez, 2015) which found 6 
to be the expected number of volunteer IPs. The maximum 
number of different IPs for each experiment is also in the 
same range and of the order of 10, which is also consistent 
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Table 2: Summary of time per run, number of IPs and num¬ 
ber of PUTs per IP in the initial runs. 


Experiment 

IPs 

Median 

Max 

Median 

time (s) #PUTs 

4/4 

5 

16 

2040 

18 

4/24 

5 

29 

732 

11 

7/31 

5 

14 

260 

23 

Cache=128 

5 

17 

222.2 

124 

Cache=64 

8 

38 

51.3 

100 

Cache=32 

6 

19 

58.9 

45 


with prior work and does not vary across the two different 
batches. 

We will have to analyze differently the median time, since 
the two batches are solving different problems. In both cases 
it possesses a big range of variation, but 50% of the time 
takes less than several minutes, from around 4 minutes in 
the best case to roughly 2/3 of an hour in the worst case. Re¬ 
markably enough, the time is more consistent in the second 
batch and always around one minute, in two cases even less, 
and that happens when the median number of IPs is higher. 
It should be noted that while the first batch of experiments 
took several days in each case, the second only lasted for a 
few hours, with a more continued effort of publicizing it in 
social networks. This is specially true in the case of cache 
equal to 64, which is noticed by the high number of volun¬ 
teers participating in the experiment. The conclusion is that, 
in general, the key factor in the time needed to find the so¬ 
lution is, as expected, the number of volunteers it is able to 
gather on a short notice. 

The number of PUTs, every one corresponding to 100 
generations, is the algorithmic result. It is relatively un¬ 
changed for the first batch and around 20, that is, 2000 gen¬ 
erations or 2000*128 = 256000 evaluations. In this case, an 
“unlimited” cache was used, with all individuals sent from 
clients stored until the end of the experiment. However, we 
were interested in measuring also the performance of the al¬ 
gorithm itself by changing the cache size, after making it 
limited. As it can be seen in the table, there is a clear change 
in the number of evaluations needed, with smaller cache 
sizes producing solutions in less evaluations, until it is for 
the cache size = 32 roughly twice as much as with the previ¬ 
ous problem, with 40 traps. This is a good result and is also 
algorithmically consistent with other results obtained using 
the same type of problems. Since in this paper we were in¬ 
terested in leveraging the user’s CPU cycles by improving 
the algorithm, a good conclusion of this paper is that having 
a small pool size helps clients to obtain “good” individuals 
from the pool, as opposed to any individual that could be 
obtained before. Besides, the cache policy deletes the old¬ 
est individuals, which makes those in the pool be current , 
helping then newcomers and any participant obtain the best 


individuals found in the last part of the experiment. Besides, 
a limited cache helps also in cases with a bigger search space 
or longer running times when the server simply crashed due 
to lack of RAM. 



£ 


Figure 3: Duration of experiments vs. number of differ¬ 
ent IPs (nodes) participating in it, with averages per num¬ 
ber of IPs and standard deviation shown as red dots; in the 
case there is a single red dot, there was a single experiment 
in which that many computers participated (for instance, 16 
computers in the experiment in the far left or 29 in the mid¬ 
dle one). Shades of blue indicate how many experiments 
included that many unique IPs, so lighter shade for a col¬ 
umn of dots indicates that a particular number of comput¬ 
ers happened less frequently, while darker shadow means 
more. From left to right and top to bottom experiments 
4/4, 4/24 and 7/31, followed by experiments with 50 traps, 
cache=128, 64, 32. 

We will have to analyze experimental data a bit further to 
find out why this happens and also if there are some patterns 
in the three sets of experiments. An interesting question to 
ask, for instance, is if by adding more computers it makes 
the experiment take less time. In fact, as shown in Figure 3, 
the addition of more computers does not seem to contribute 
to decreasing the time needed to finish the experiment. How¬ 
ever, the cause-effect relationship is not clear at all. It might 
be the opposite: since experiments take longer to finish and 
might in fact be abandoned with no one contributing for 
some time, the probability of someone new joining them is 
higher. In fact, with experiments taking a few seconds and 
due to the way the experiments are announced, it is quite 
difficult that several volunteers join in in such a short period 
of time, even more if we take into account that volunteers 
are not carried over from previous experiments. 

That is why we used a more difficult problem in the sec¬ 
ond batch of experiments, which is shown in the bottom row 
of Figure 3. The pattern is remarkably similar, showing a 
positive correlation between the time for solving the prob¬ 
lem and the number of computers, at least for cache sizes 
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Figure 4: Simultaneous IPs every minute of the experiment 
with cache = 64. 


128 and 64. However, it is interesting to observe that, for 
cache=32, the time needed to find the solution decreases 
from one to approximately 4-5 nodes, to then increase for 
a higher number of participating computers, distinguished 
by IP. The green dot at the bottom is probably an outlier that 
we will try to explain later on. This leads us to conclude that 
a larger amount of computers might contribute to speed up 
the solution, if the time the experiment ideally takes is suffi¬ 
cient, that is, of the order of a minute, and enough volunteers 
concur simultaneously. This is also observed, not so clearly, 
in the case of cache=64, with an interval of around 10 IPs 
obtaining less time than experiments with less or more IPs, 
and of the same order, between 10 and 100 seconds. If we 
look at the graph that shows the number of IPs or volunteers 
per minute for this experiment, shown in Figure 4, we see 
that there are peaks of more than 25 volunteers, and a period 
of several hours with a minimum of 6 computers and peaks 
of more than 10. The long period after midnight where there 
is a single volunteer left masks the success achieved during 
this set of experiments, from which we draw two lessons: 
first, you need a social network infiuencer to announce your 
experiments and second, no matter what, do not do any ex¬ 
periment after midnight. This statement, which might seem 
tongue in cheek, in fact, it is a conclusion drawn from the 
experimental data and to what extent the social network is 
an essential part of the description and performance of the 
NodIO volunteer computing system. 

It is also interesting to check the distribution of the experi¬ 
ment duration, shown in Figure 5 and which roughly follows 
a Zipf’s law, with similar distribution along all three runs. 
The 4/24 run is the most complete and shows an S-shape, 
which implies an accumulation of experiments taking sim¬ 
ilar time and around 100 seconds; this S-shape appears too 
in the experiments with cache=128 (bottom row, left). The 
most interesting part is the tail, which shows how many ex- 



Figure 5: Duration of experiments vs. rank of experiments 
sorted by descending duration, with y axis in a logarithmic 
scale. Dot color is related to the number of IPs participat¬ 
ing in the experiment. From left to right and top to bottom, 
experiments 4/4, 4/24 and 7/31 and caches=128, 64, 32. 


periments took a desirable amount of time, on the order of 
10 seconds, and which appears in all three graphs. As it 
can be seen, it sharply drops implying there are just a few 
of them, and with diminishing probability as time decreases. 
However, since they have a greenish color, implying a low 
number of IPs, they might be due to clients carrying over 
from the previous one. This is a characteristic of this im¬ 
plementation which will be examined later on, but at any 
rate, if we discard those experiments that take too much or 
too little, there is a decreasing exponential distribution that 
corresponds to the Zipf’s law. 



Figure 6: Number of PUTS per unique IP and fit to a Weibull 
distribution (in red); axis x shows IPs sorted by descend¬ 
ing number of PUTs. From left to right and top to bottom, 
experiments 4/4, 4/24 and 7/31 and new experiments with 
cache=128, 64, 32. 

A similar exponential distribution also appears if we rank 
HTTP PUTs, equivalents to the number of generations di- 
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Table 3: Weibull distribution parameters of the fit of the 
number of PUTS per unique IP. 


Experiment 

Scale 

a 

Shape £ 

4/4 

43.07 

zb 

5.80 

0.57 

zb 

0.03 

4/24 

22.97 

zb 

1.57 

0.66 

zb 

0.02 

7/31 

53.18 

zb 

7.77 

0.54 

zb 

0.03 

Cache 128 

205.28 

zb 

32.10 

0.77 

zb 

0.07 

Cache 64 

178.99 

zb 

36.44 

0.60 

zb 

0.05 

Cache 32 

168.15 

zb 

49.28 

0.57 

zb 

0.07 


vided by 100, or to evaluations divided by 12800, con¬ 
tributed by every user, which is shown in Figure 6. These 
results show a Zipf-like behavior, that is, a power law with a 
small bump in the lowest values. After testing the General¬ 
ized Extreme Value distribution and failing for the new batch 
of experiments, we have fitted it to a Weibull distribution 
Thoman et al. (1969) with the resulting parameters shown in 
Table 3. The inverse Weibull distribution is a special case 
of the GEV distribution in those papers, and appears usu¬ 
ally in natural sciences and artificial life, usually related to 
decay. It has been frequently fitted to volunteer computing 
frameworks such as SETI@home (Javadi et al., 2009). The 
model that user behavior follows can be explained straigh- 
forwardly: when users visit the page, it draws their atten¬ 
tion for a limited amount of time. They give it a chance 
for a few seconds. If something there amuses them or they 
can engage in a conversation about it, they stay for a while 
longer, otherwise, they leave. The scale parameter, which 
is around 20-40 in the first batch for the 40 Trap problem 
and between 160 and 200 in the 60 traps problem, depends 
mainly on the maximum number of generations people leave 
it running. Since it finishes or stalls after a number of gen¬ 
erations shown in Figure 2, volunteers just leave after that. 
Curiously enough, the scale parameter is roughly twice the 
median number of PUTS per experiment, showing that, on 
average, the most loyal users reload the page twice after fin¬ 
ishing or after seeing the evolution does not progress. This 
rule of thumb breaks down with the last experiment, how¬ 
ever, which is interesting by itself, too. 

The slope or shape parameter, on the other hand, indicates 
the overall shape of the curve. A value less than 1 indicates 
a concave (in a non-algorithmic scale) curve, with figures 
closer to one indicating a smaller slope. In all cases values 
are between 0.54 and 0.77, independently of the experiment. 
It might be the case that this number depends more on the 
total number of experiments carried out, with sets with more 
experiments, both in the middle, having values between 0.60 
and 0.70. The distribution is remarkably similar which gives 
us a model of user behavior that is, to a certain extent, inde¬ 
pendent of the experiment. 

These experiments show that, as it was proved for other 
volunteer computer frameworks and also in the case of 


games, user engagement follows a Weibull distribution. This 
makes engagement the key for leveraging the performance 
of the socio-technical metacomputer and a way to improve 
results in the future. 

Conclusion 

Our intention in this paper was to assess the capabilities 
of a socio-technical system formed by a client-server web- 
based framework running a distributed evolutionary algo¬ 
rithms and the volunteers that participate in the experiment. 
These volunteers are in the cloud , that is, available as CPU 
as a service for the persons running the experiment. In this 
paper we have tried to put some figures on the real size of 
that cloud and how it can be used standalone if there is no 
alternative, or, if other computing resources are available, in 
conjunction with other local or cloud-based methods to add 
computing power in a seamless way through the pool that 
NodIO creates. 

After running the experiment on the 40-trap problem 
whose running time could be as low as a few seconds, we 
switched to another batch of experiments where we used the 
50-trap problem and also to a pool of limited, and dwindling 
through the three experiments, size. Since our initial results 
indicated that what happened on the screen, a flat graph with 
no improvements or the experiment finished, influenced the 
amount of time that the users devoted to the experiment, a 
longer one could yield different results and, at the same time, 
result in a big pool that might either crash the server or re¬ 
turn useless individuals to the volunteers. These new exper¬ 
iments have proved that using the limited pool is beneficial 
to finding the solution, since less evaluations are needed, but 
also that since the problem is more difficult, the users stay 
for longer in the web page, making less ephemeral the socio- 
technical computing system created by the simulation. 

The second objective of this paper was to model the user 
behavior in a first attempt to try and predict performance. As 
should be expected, the model depends on the implemen¬ 
tation, with contributions following a Weibull distribution, 
which reflects the fact that volunteer computing follows a 
model quite similar to that found for games or other online 
activities. The reverse might be true: if we want to have 
returning users for the experiments, it is probable that we 
should gamify the experience so that once they’ve done it 
once, they might do it more times. In the spirit of Open Sci¬ 
ence, this gamification might involve computing in real time 
data such as the one presented in this paper and showing it 
in the same page or presenting user results alongside others. 

In general, linking and finding correlations between user 
choices and performance is an interesting avenue to explore 
in the future. Even if these experiments were published in a 
similar way, one obtained up to five times more total cycles 
than the one with the least number of cycles. It is also essen¬ 
tial to obtain volunteers as fast and simultaneously as possi¬ 
ble, so it is possible that the features of the social network in 
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terms of real-time use will also play a big role; synchronous 
webs such as Snapchat, which mystifies the writers of this 
paper, and Twitter, thanks to its real time nature, might be 
better suited than Facebook, Linkedln or Google plus. Even 
as it is difficult to create controlled experiments in this area, 
it is an interesting challenge to explore in the future. 

The other area to explore is the algorithmic area itself. 
Are there ways to change the evolutionary algorithm, or its 
visualization, so that the user has a bigger impact on the re¬ 
sult? One of the users in Twitter even suggested to embed 
videos so that people spent time looking at them, but other 
possible way was to make the user engage the algorithm by 
giving him or her buttons to change the mutation rate when 
the algorithm is stalled, for instance. If this is combined with 
a score board where local performance is compared to other 
users, engagement might be increased and thus the perfor¬ 
mance of the system. In general there are many issues with 
the evolutionary algorithm implementation itself, including 
using different, or adaptive, policies for inserting and send¬ 
ing individuals to the pool, using different policies for popu¬ 
lation initialization, and also the incorporation of high-speed 
local resources to the pool to check what would be the real 
influence of the volunteer pool to the final performance. 

Finally, the implementation needs some refinement in 
terms of programming and also ease of use. Tools such 
as Yeoman for generating easily new experiments might be 
used, so that the user would have to create only a fitness 
function, with the rest of the framework wrapped around au¬ 
tomatically. 

All these avenues of experimentation will be done 
openly following the Open Science policy of our group, 
which, in fact, contributes to establish trust and secu¬ 
rity between us and volunteers and is an essential fea¬ 
ture of the system. That is why this paper, as well 
as the data and processing scripts, are published with a 
free license in GitHub at https : //github . com/JJ/ 
modeling-volunteer-computing. 
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Abstract 

Models of robust human-human coordination can guide the 
design of adaptive and responsive human-robot systems. Here 
we test an artificial agent that embodies low-dimensional 
nonlinear dynamic equations derived from human behavior 
while completing a two-agent herding task, where the goal is to 
contain reactive spheres to the center of a target region. The 
model was able to complete the task alongside human novices in 
a virtual version of the experimental setup used in Nalepka and 
colleagues (submitted). Not only did the model lead participants 
to successful performance, but also 12 out of 18 participants 
reported that they believed their partner was a human participant 
in another room. The model was therefore able to capture the 
complex social behavior that defined robust task success in terms 
of lower dimensional dynamical equations that characterizes the 
emergent behavioral dynamics of embedded multiagent 
behavior. 

Introduction 

Joint human coordination is adaptive and robust - pairs of 
individuals can work together to complete tasks, such as 
carrying a couch, through a variety of different environments 
without much difficulty. Developing robotic systems that have 
these qualities will be useful for problems that need to be solved 
in a wide range of environments and contexts, with both human 
and robot partners. One possibility for creating such adaptive 
systems is provided by models of the low-dimensional 
behavioral dynamics (Warren, 2006) that emerge when human 
pairs complete tasks successfully. Because these models 
consist of a handful of ordinary differential equations, they can 
be used to drive the behavior of socially embedded artificial 
systems and exhibit a wide range of behavioral complexity in a 
relatively compact form. 

Our approach (Richardson et al., 2015; in press) is based on 
research in cognitive science inspired by dynamical systems 
theory and complexity science. According to this research, the 
local actions and behaviors of relatively simple systems can 
give rise to complex coordinative behavior. Fairly well known 
examples of this research in non-human systems include the 
coordination of fish schools and bird flocks, where simple rules 


1 The task was inspired by shepherding behavior in dogs, but is not set out 
to explain actual shepherding behavior (see Strombom et al., 2014), but 
rather was referred to as such as a way to describe the task to participants. 


at the local or individual level lead to complex coordinated 
movement (Couzin, Krause, Franks, & Levin, 2005; 
Hildenbrandt, Carere, & Hemelrijk, 2010; Reynolds, 1987). 
Applied to psychology, this has been used to explain path 
navigation and obstacle avoidance (Fajen & Warren, 2003; 
2004; 2007), rhythmic movement coordination (e.g., 

Richardson et al., 2007; Schmidt & O’Brien, 1997; Schmidt & 
Turvey, 1994), and anticipation (Washburn et al., 2015). In the 
case of path navigation and obstacle avoidance, only a small set 
of nonlinear equations is required to accurately model and 
predict human movement towards a goal through a cluttered 
environment (Warren & Fajen, 2003; Warren, 2006). In this 
model the goal is treated as a point-attractor and obstacles, such 
as desks, chairs, or other people, are treated as repellors. Fajen 
and Warren’s model has been implemented successfully as the 
basis for an artificial navigation system (Nemec & Lahajnar, 
2009). Similar work has been done to extend this methodology 
to passing objects between partners in order to reach a goal 
location (Lucas et al., 2015). In the rhythmic coordination 
research, artificial agents have been developed to coordinate 
and produce new stable coordination patterns with partners 
(Kostrubiec, Dumas, Zanone, & Kelso, 2015; Zhang, Dumas, 
Kelso, & Tognoli, 2016). However, research is needed to extend 
this approach to understand and model the coordinative 
dynamics that emerge in more complex situations that require 
the control of a dynamically changing environment. In the 
remainder of the paper, we’ll describe and test an artificial agent 
able to work with novices to contain a set of autonomous agents 
as an illustration of how these types of situations can be 
understood. 

Behavioral Task 

A two-player shepherding game was developed in Nalepka and 
colleagues (submitted) to understand emergent coordination 
dynamics in more complex and dynamically changing 
environments. 1 In the task, dyads stood on either side of a 
frosted glass tabletop (see Figure 3A) and controlled cube¬ 
shaped agents (sheepdogs) with wireless handheld motion 
sensors tracked using a Polhemus Liberty Latus motion 
tracking system. Participants interacted in a two-dimensional 
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environment projected onto the glass top (see Figure 1). The 
environment consisted a fenced grass area and three or seven 
balls (sheep). The goal of the dyad was to cooperate and 
discover a solution to contain all the sheep within the red target 
containment area for a certain length of time and for a certain 
amount of trials before time ran out (45 minutes). The sheep 
could move freely within the fenced playing field, but could not 
pass beyond the fenced region. If the sheep collided with the 
fence, the trial ended prematurely. If the sheep collided with 
one another they acted like solid objects and could not pass 
through one another. When left alone the sheep exhibited 
Brownian motion, but when a player’s sheepdog was within 12 
cm of a sheep, the sheep reacted to the players’ proximity by 
moving in the direction opposite of the participant’s position. 
Therefore, participants needed to learn how to push the sheep 
towards the center of the field without moving too close as to 
accidently cause the sheep to scatter away from the participant. 
Participants were not allowed to explicitly communicate with 
one another regarding their game play or strategy while 
engaged in the task. 



Figure 1: A prototypical time lapsed ( ti..M ) depiction of game play, 
including an illustration of the search-and-recover (S&R) and coupled 
oscillatory contained (COC) behavioral modes exhibited by pairs. 
Arrows indicate movement direction. 


Only 31 of 42 tested pairs were able to meet the success 
criteria, and all but two of them discovered the same 
coordinative strategy to corral the sheep. The dominant 
coordinative strategy consisted of two different modes of 
behavior. We will call these modes the Search & Recover 
(S&R) strategy, depicted in Figure lti -2 and the Coupled 
Oscillator Containment (COC) strategy, depicted in Figure U 3 - 
4 respectively. When employing the S&R mode of behavior, 
participants implicitly divided the task space in half, and each 
individual in the dyad pursued the sheep furthest from the 
center of the target region, placing their respective marker such 
that it propelled the farthest sheep on their respective sided 
towards the center. As the sheep clustered, an oscillatory 
movement of the sheepdogs began to emerge as the result of 
participants moving to similar regions in of the playing field to 
contain the sheep. After some time, a bifurcation occurred and 
participants begin to make regular antiphase (Figure U3) or 
inphase (Figure U4) oscillatory movements. This is the 
emergence of the COC strategy. Once this COC behavioral 
mode emerged, dyads would begin subsequent trials by moving 
directly into this behavioral mode. It was found that this 


oscillatory behavior is consistent with the dynamics produced 
by coupled nonlinear oscillators, which have also been used to 
understand the dynamics of intra- and interpersonal rhythmic 
movement (e.g., Richardson et al., 2007; Schmidt et al., 1990; 
Schmidt & O’Brien, 1997; Schmidt & Turvey, 1994). These 
dynamics, as described by the Haken, Kelso and Bunz model 
(Haken et al., 1985; Schoner et al., 1986), indicates that two 
stable oscillatory modes are possible: inphase and antiphase 
behavior, with inphase behavior being more stable than 
antiphase. This relative stability is consistent with the 
experimental results and suggests that the dyads may be 
embodying similar dynamics when performing the COC 
behavior. 

The Artificial Shepherd 

Given the results in Nalepka and colleagues (submitted) we 
developed an expert artificial agent (EAA) based on a model, 
visually depicted in Figure 2, which characterizes the two 
behavioral modes present in past human data (chosen values 
will be presented in parentheses after each parameter). S&R 
involved participants selecting a sheep that is furthest from the 
red target region, and moving their sheepdog so that the 
targeted sheep is in a line between the player and the central 
region. This behavior can be modeled with two sets of 
equations below: 

Ti^briXi + Siij'i ~ (SPsd,i(t ) ^'md,i )) — 0 (1) 

0£ + boidi + Mi(^i - ( Pse,i(t) D sd,i ) = 0 ( 2 ) 

The model operates on a polar task axis where r t is the radial 
distance of the player’s sheepdog from the center of the field, 
with r and r t being its velocity and acceleration along that 
axis, respectively. Parameter 0 t is the angle, in radians, from 
the reference angle which is on the sagittal plane, with 6 t and 
61 being the angle’s velocity and acceleration terms. The 
subscript i indicates player identification. Parameters b ri (5) 
and b ei (5) are the velocity damping terms, £ t (20) and [i t 
(15) scales the rate at which player i minimizes the difference 
between their current position to the target radial position, 
<P s am and an S le > Vsem of the tar £ et shee P> <Ps(t)- The 
parameter C mdi (0.05) is a fixed value which indicates the 
minimal distance that the agent should stay away from the 
sheep. This parameter prevents the EAA from moving on top 
of the target sheep, as well as too close into the central 
containment region, preventing unreliable sheep repulsion 
behavior. The parameter D sdi is a Heaviside parameter 
defined as, 


Dsdi 


(psd,i(t E) — D sd> i 
(psd,i(t t) ^ C sd i 


( 3 ) 


where D sd i is zero when the farthest sheep is less than a fixed 
parameter Csd,h (0.08) indicating there is no furthest sheep to 
corral and so the EAA moves towards the reference angle axis. 
It should be noted that the model only pursues sheep that are on 
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Figure 2: Illustration of the task space axes employed for the bi-agent herding 
model. Illustration of the task space employed for the sheep-herding model 
which captures player V s (where i = 1 or 2) sheepdog location at any time 
within the <f x and <£, planar game space in polar coordinates (n, 6i), with n 
corresponding to the radial distance of player i from the center of the <f x and 
planar game space and 6i corresponding to player z's radial angle (± 90°) 
from the 4 game space axis on their side of the game field, xi is the oscillatory 
perimeter path of each participant Vs sheepdog with respect to half (tt- 
radians) of the target containment region of success closest to the 
participant’s side of the game space, such that each participant’s perimeter 
path, xu is centered on the participants radial (n, Oi) position within the (<f x , 
game space. Adapted from Nalepka and colleagues (submitted) with 
permission from the authors. 


their half of the playing field, which is consistent with data 
found in Nalepka and colleagues (submitted). 

When a certain level of proficiency is achieved, players 
transition from the S&R behavior to the COC mode of behavior 
whose global behavioral dynamics is consistent with the HKB 
model (Haken et al., 1985). To be consistent with the previous 
research modeling the dynamics of rhythmic human limb 
movements (Kay et al., 1987), and rhythmic interlimb and 
interpersonal coordination (see Haken et al., 1985; Kelso, 1995; 
Schmidt & Richardson, 2008), the COC mode of behavior was 
modeled using a set of coupled Rayleigh/van der Pol hybrid 
nonlinear oscillators of the form, 

x ± + a ± x ± + fiyx\ + y 1 x\x 1 + colx 1 
= (* 1 - x 2 XA + B{x 1 - x 2 ) 2 ) 

(4) 

*2 + a 2 x 2 + fS 2 X 2 + 72*2*2 + " 2*2 

= (* 2 - *1 )G4 + B{x 2 - Xl ) 2 ) 

where z = 1 for player 1 and z = 2 for player 2, and the 
positive/negative (excitatory/inhibitory) damping parameter 
a t scaled as a function of sheep distance using the linear 
equation, 

d; = 8i(C(cp 2 sdm ) - D(C sdii ) - a t ) (5) 

Parameter x t indicates the player’s position on the circular x 
task axis, centered on the player’s (r h 6i). The possible range of 
motion is roughly {6 t + -), with x t and x t being its velocity 
and acceleration along x task axis. Note that x t in Eq. 4 outputs 
a range of values roughly (-1,1) which is then multiplied by - 
to convert the value into game space units after iteration. The 

7 

parameter u>j (-7r) defines the stiffness or frequency of 

4 

oscillator of player z’s perimeter path movement, and the 
functions (foxf) (1) and ( YiX?x l ) (5) corresponding to the 
Rayleigh and van der Pol escapements terms for each player’s 


x t perimeter path movement, respectively. The coupling 
function to the right of the equals sign in each equation is the 
same as that previously derived by Kelso and colleagues (e.g., 
see Haken et al., 1985; Kelso, 1995), and defines both inphase 
(0°) and antiphase (180°) relative phase relationships as the 
stable coordination modes between the two oscillators (when 
a t < 0), with the relative strength of these two coordination 
modes defined by the parameters A and B (both -0.2). The 
system is bi-stable when |42?| > \A\, but mono-stable (inphase 
only) when |4i?| < \A\. 

The change between the two modes of behavior, S&R and 
COC, is the Hopf bifurcation in the variable x t that occurs for 
each oscillator system. The bifurcation is driven by Eq. 5: when 
a t > 0, behavior along the x t task axis corresponds to that of 
a nonlinear mass damped spring with a stable fixed point 
solution; when a t < 0, behavior along the x t task axis 
corresponds to that of a nonlinear limit cycle oscillator, with an 
amplitude of movement approximately equal to 

Xuamp = 2^\a t \/Yi ( 6 ) 

when cci < 0, ft > 0 y t > 0 and \a t \ « (see Kay et al., 
1987 for more details). The value a t in Eq.5 at any instance in 
time, ( t ), is a differential function of the distance, (p S d,i(t )9 °f 
the furthest sheep on player z’s side of the game space with 
respect to a maximum safe containment distance, C sd i , with 
parameter (25) which controls the rate at which d t reaches 
0. Parameters C (9) and D (8) adjusts the weight given to 
(Psd.iit) an d Csd,i in determining d t . If the distance, (p S d,i{ty °f 
the sheep furthest from the center of the game space on player 
z's side of the game space is outside player z's maximum safe 
containment distance, C sd i , and a t > 0, then behavior along 
the x t task axis corresponds to that of a nonlinear damped 
mass spring. Conversely, if the distance, (p S d,i(t)> °f th e sheep 
furthest from the center of the game space on player z's side is 
inside player z's maximum safe containment distance, C sd i , and 
a t < 0, then behavior along the x t task axis corresponds to that 
of a nonlinear limit cycle oscillator. Values near zero produce 
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minute scrubbing behavior by the agent which, anecdotally, is 
consistent with past human data. 

After piloting, changes to how the target sheep, (p s ^, is 
determined were made from the model’s original formulation 
in Nalepka and colleagues (submitted). The following 
Heaviside function, 

«; >T 

< f >s(t> ~ jma xr dj , ctj < r 

V jeep 1 

determines which sheep the agent will pursue. Value & d j 

represents the distance for a given sheep j\ to the closest fence 

segment. r dj represents the radial distance of sheep j from the 

center of the target region, and t (0.5) represents a fixed 

parameter determining when the sheep selection strategy 

changes. Parameter r is indirectly derived from a specific 

distance from the center of the field, but is more intuitively 

operationalized as being associated with a change in the mode 

of behavior, with min0 d/ used when engaging in S&R 
je 0 J 

behavior, and max F d] when utilizing COC behavior. Equation 
je& 1 

7 was included because, due to the rectangular playing field, 
the EAA was placed in situations where it would pursue a sheep 
that, although further from the center, caused trials to fail as 
sheep closer to the center would hit the fence on the short end 
of the rectangle. This post-hoc method was included as to 
prevent the model from continuing letting their partner down. 
Additionally, the function that determines the target sheep was 
altered to select the position of the target sheep + 1% of its 
normalized velocity vector. This was done to prevent situations 
where the model would cause the target sheep to spiral around 
the target region due to the sheep maintaining its tangential 
velocity. 

Current Study 

The present research set out to test and validate the above 
model against a set of novice participants. For the current study, 
the coupling term in Eq. 4 was not included. This was done in 
order to determine if the COC behavior would emerge without 
the explicit coupling to the participant. The study had three 
aims: 1) to determine whether the EAA can perform the herding 
task with a human novice, 2) to determine whether novices 
learn to coordinate with the artificial agent to produce COC 
behavior consistent with data presented in Nalepka and 
colleagues (submitted) and 3) determine whether participants 
will remain in belief that they are performing the task alongside 
a human agent as a way to test the humanness of the model. 

Method 

Participants 

Eighty participants took part in the study. Thirty pairs were 
formed in the novice control condition, and the remaining 20 
completed the EAA condition. All participants received 
research credit as part of a class requirement for an 
undergraduate Psychology course. 


Apparatus & Task 

The task was designed using the Unity 3D game engine 
(version 5.2.0; Unity Technologies, San Francisco, California) 
and was presented to participants via Oculus Rift DK2 (VR) 
headsets (Oculus VR, Irvine, California). The presented virtual 
environment (Figure 3B) was modeled at 1:1 scale after the 
experimental room (Figure 3A). The task was presented in the 
VR headset to appear on a virtual tabletop modeled at 1:1 scale 
after the glass tabletop in the real environment which then acted 
as the solid physical surface for participants to move their 
motion sensors on. Participants used wireless Latus motion 
tracking sensors operating at 96 Hz (Polhemus Ltd, Vermont, 
USA) Participants moved the sensor along the glass tabletop 
and hand movements translated 1:1 to movements of the 
player’s cube (sheepdog) in the virtual environment. 
Participants were given a body in the virtual world, modeled 
after a crash test dummy of height 1.8m whose motion was 
controlled using an inverse kinematic calculator (model and 
calculator supplied by Root Motion, Tartu, Estonia) based off 
the real movements of the participant’s right hand (via a Latus 
motion sensor) and head (via the Oculus Rift). Regardless of 
height, each participant was calibrated to fit the body of the 
virtual model, giving each participant an equivalent image size 
of the playing area. The reason for virtual bodies was threefold: 
to best emulate the conditions in the original research by 
Nalepka and colleagues (submitted) where participants’ arms 
were able to occlude the playing surface, to provide 
approximate information regarding the arm location of one’s 
partner (to avoid hitting), and to further increase the 
believability that participants were interacting with a human in 
the EAA condition. Separate computers were used to power the 
headsets and data was transferred to the host computer via a 
LAN connection. The maximum display latency between the 
participants’ real-time movements and the virtual (box) 
sheepdog was 33 ms. Game states were continuously recorded 
at 50 Hz, including the movements of the virtual sheep and the 
participant controlled sheepdogs. 

Participants were able to move their sheepdogs anywhere in 
two dimensional space within the 1.17m by 0.62m fenced area 
of the grass task field. The goal of the game was to contain 3 or 
7 wool covered stimulus balls (sheep) within the red circular 
target containment region measuring 19.2 cm in diameter for 
70% of the last 45 seconds of a 60 second game trial (the first 
15 seconds of each trial served as time for participants to 
initiate a behavioral coordination strategy and/or corral the 
sheep). All sheep needed to be inside this region for it to count 
towards the dyad’s score. Participants received visual feedback 
regarding their performance at the end of completed trial (i.e., 
what percentage of time they managed to keep the sheep within 
the target area). A game trial ended prematurely (i.e., before 60 
seconds), however, if one of the sheep managed to hit the 
perimeter fence or if all sheep escaped the 29 cm white circle 
that surrounded the red target region circle. At the start of a 
trial, the sheep were distributed within the red target 
containment region (Figure 3c-d), with the subsequent motion 
of the sheep governed by random Brownian motion dynamics. 
The sheep also dynamically reacted to the participant 
controlled sheepdogs as if threatened, being repelled away from 
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a participant’s sheepdog when the sheepdog was within 12 cm 
of the sheep’s game location. When threatened, the sheep 
would move directly away from the player at a speed 
proportional to the inversed of the squared distance between the 
sheep and the player. It is also important to note that sheep were 
programmed to be able to collide (as opposed to pass through) 
each other. Finally, pairs played the game for a maximum of 45 
minutes, with the experiment ending either at the end of this 
experimental period or earlier if the pairs made it past the 70% 
success threshold eight times. 



Figure 3: (A) Experimental Room, (B) Virtual replica of (A) presented 
to participants. (C-D) Depiction of the initial arrangement of the 3 and 
7-sheep conditions, respectively from the perspective of the 
participant. 

In the novice control condition, participants stood on either 
side of the table as depicted in Figure 3B. In the EAA condition, 
participants were told that their partner had come early and was 
set up to complete the task remotely in a room next door. In the 
EAA condition, the model controlled its respective sheepdog 
via the model presented above. To give additional realism, the 
EAA model body’s head was programmed to linearly 
interpolate its gaze direction towards the target sheep position, 
<Ps(t)> with ± 2.5 cm movement noise. Additional noise was 
added to the model’s radial (±1^ to Eq. 1) and oscillatory 
movement (±^^r to Eq. 4). In both conditions, participants 

were not able to see their partner in between trials so as to avoid 
the display of task-irrelevant behavior that was not 
incorporated into the EAA. 

Procedure 

Prior to arrival participants were randomly assigned to either 
the three or seven sheep condition. Following informed 
consent, during which time participants were told that they 
would be required to play a virtual shepherding game, 
participants were either lead into the testing room and were 
randomly assigned to opposite sides of the tabletop display (the 
novice control condition), or told that their partner had come 
early and will perform the task in a different room (the EAA 
condition). Each participant was then handed a wireless Latus 
motion tracking sensor and informed that they would be using 


these motions tracking sensors to control their respective cube 
(sheepdog) in order to corral a set of balls (sheep) into the red 
target region of the grass game field. Participants were 
instructed to hold the motion sensors with their right hand and 
control their sheepdogs by sliding the sensors on top of the 
tabletop display. This ensured that the location and movement 
of their corresponding sheepdog was aligned with the motion 
tracking sensors. Participants were then shown the game field 
and the rules of the game were described (i.e., rules for trial 
success and failure detailed above). Importantly, no 
instructions about how to best play the game or how to 
coordinate or corral the sheep within the game region were 
provided. Participants were simply told to complete the task to 
the best of their ability. Participants in the novice control 
condition were told that they were not allowed to talk or 
verbally strategize at any time during the experimental session 
(neither within nor between trails). An experimenter was 
present during the experimental session to enforce this no- 
talking rule. After the experiment, participants were debriefed 
on the purpose of the study and participants in the EAA 
condition were asked questions regarding their interaction with 
their partner. The first question asked participants if they 
noticed anything odd in the experiment, the second question 
asked if they had a feeling that their partner has completed this 
task before, and the third question asked if they thought at any 
point in the experiment they doubted that they were completing 
the task with a human. 

Results 

The first aim seeks to determine if the EAA can complete the 
herding task alongside novices. For all analyses, one pair in the 
novice control condition and two participants in the EAA 
condition were excluded from analyses due to program 
malfunction. The remaining participants who met the winning 
criteria, which was to keep the sheep contained in the red 
central region for 70% of the remaining 45 seconds of the trial 
eight times, were kept for analyses. Twenty-three of the 
remaining 29 pairs (79.3%) in the novice control condition met 
the winning condition, and all 18 participants in the EAA 
condition met the winning condition. This confirms that the 
developed EAA is able to perform the task alongside human 
novices. 

To investigate differences in performance between groups, 
several summary statistics and performance variables were 
considered. A 2 (Condition: control, EAA) x 2 (Sheep: 3, 7) 
between-subjects ANOVA was conducted on the amount of 
time dyads took to complete the experiment. A significant 
condition x sheep interaction was found, 7^(1,37) = 7.75, p = 
0.008, rj 2 = 0.17. For the novice control condition, a significant 
main effect was found for the number of sheep, F(l,20) = 
21.18, p < 0.001, rj 2 = 0.50, such that less time was taken to 
complete the 3-sheep condition (15.66 minutes) than the 7- 
sheep condition (29.07 minutes) (p < 0.001; all post-hoc tests 
in this paper use Bonferroni corrections). No significant 
difference was found in the EAA condition (3-sheep: 16.53 
minutes, 7-sheep: 18.86 minutes; p = 0.37). Next, a 2 
(Condition: control, EAA) x 2 (Sheep: 3, 7) between-subjects 
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Inphase 

Antiphase 

Both-Phase 

No/Other-Phase 

Novice Control _ 

3-Sheep 

82.14% (4.29%) 

5.36% (2.53%) 

7.14% (3.13%) 

5.36% (2.85%) 

1 IV T IW V/l/ll 1/1 VFi - 

7-Sheep 

66.67% (9.77%) 

16.67% (8.07%) 

8.33% (2.95%) 

8.33% (2.95%) 

A_i _ 1 A ___i _ 

3-Sheep 

48.44% (12.15%) 

26.56% (9.28%) 

1.56% (1.56%) 

23.44% (8.33%) 

Artificial Agent 

7-Sheep 

42.50% (11.21%) 

21.25% (5.29%) 

12.50% (5.59%) 

23.75% (6.57%) 


Table 1: Proportion of relative phase modes observed during COC behavior. Note: Both-Phase corresponds to trials in which pairs produced 
significant periods of both inphase and antiphase coordination. No/Other-Phase represents data remaining that were not categorized under the 
first three labels. Standard error in parentheses. 


ANOVA was conducted to see if there were differences in the 
amount of time dyads were able to keep the sheep in the red 
target region on successful trials, measured as the percentage 
of time within the last 45 seconds of each trial. A significant 
main effect on condition was found, F(l,36) = 7.78 ,p = 0.008, 
rj 2 = 0.18, such that pairs in the novice control condition had a 
lower score on average (83.14%) than those who completed the 
task alongside the artificial model (87.80%) (/? < 0.008). Not 
only were participants in the EAA condition able to keep the 
sheep in the red target region for a longer percentage of time, 
but the amount of time taken to complete the experiment was 
consistent across sheep herd conditions. 

Another measure to compare game performance across 
groups involved analyzing the movement of the sheep. In this 
measure, differentiating what makes a dyad better at containing 
the sheep is operationalized as a dyad’s ability to minimize the 
spread of the sheep and to minimize their movement from the 
center of the red target region. These qualities were measured 
by taking the normalized average area of the convex hull that 
spans over the sheep (measure of spread/sheep), as well as 
taking the average root mean square (RMS) of the sheep’s 
position from the center (measure of variability from center). 
Separate 2 (Condition: control, EAA) x 2 (Sheep: 3, 7) 
between-subjects ANOVAs were conducted on each of these 
dependent measures. First, significant differences between 
testing conditions, 7^(1,37) = 32.00,/? < 0.001, rj 2 = 0.46, and 
sheep herd size, F(l,37) = 5.71, p = 0.02 rj 2 = 0.13, on the 
average sheep spread were found, such that sheep in the EAA 
condition were less dispersed (3.80 cm 2 /sheep) than in the 
control condition (5.52 cm 2 /sheep) (p < 0.001), and that sheep 
in the 7-sheep condition took up more area (5.02 cm 2 /sheep) 
than those in the 3-sheep condition (4.29 cm 2 /sheep) (p < 0.01). 

Second, significant main effects were found for the root 
mean square (RMS) of the sheep’s distance from the center of 
the red containment region across condition, F(l,37) = 18.79, 
p < 0.001, rj 1 = 0.34, and sheep herd size, F(l,37) = 41.76,/? < 
0.001, rj 1 = 0.53, such that the average sheep RMS was lower 
in the EAA condition (3.59 cm) than the control condition (4.18 
cm )(p < 0.001) and that sheep in the 7-sheep condition had a 
lower RMS (3.44 cm) than in the 3-sheep condition (4.33 cm )(p 
< 0.001). These results indicate that participants in the EAA 
condition were better able to both minimize the spread of the 
sheep, as well as to keep the sheep closer to the center of the 
red containment region. Along with the results above, this is 
most likely due to the EAA producing relatively consistent 
expert behavior across all participants and trials. Consistent 
with findings from Nalepka and colleagues (submitted), it is 
more difficult to keep the sheep in the center of the containment 


region, as indicated by the RMS results, in the 3, as opposed to 
the 7-sheep conditions. The reason for this is due to the sheep- 
sheep collisions that are possible. A higher probability of 
collisions are possible in the 7-sheep condition, causing the 
formation of sheep clusters, which behave as small moving 
masses which are easier to control than individual sheep. 

The second aim was to determine whether participants in the 
EAA condition are able to discover the COC mode of behavior 
as a successful strategy to contain the sheep, as well as whether 
these participants would reproduce the dynamics of nonlinear 
coupled oscillators, whose behavior produces globally stable 
in/antiphase behavior as modeled by the HKB model (Haken et 
al., 1985). To test this, relative phase analyses were conducted 
on the last 45 seconds of each successful trial. Time-series of 
each pairs’ instantaneous relative phase using the Hilbert 
transform on the radial angle of each player’s sheepdog to a 
reference axis was calculated (see Pikovsky et al., 2001 for 
details about this transformation). A 4 th order Butterworth 
band-pass filter excluding frequencies under 0.375 Hz and 
above 5 Hz was done to exclude oscillations that happen on a 
timescale irrelevant to the task. Distributions of the absolute 
relative phase angles that occurred across six 30° regions (i.e., 
0°-30°, 30°-60°... 150°-180°) of relative phase between 0° and 
180° was calculated for each trial. For these distributions, 
inphase and antiphase coordination is indicated by a 
concentration of relative phase angles near 0° and 180°, 
respectively (Schmidt & O’Brien, 1997; Richardson et al., 
2005). In order to determine if a pair’s distribution of relative 
phases were significantly inphase or antiphase, 1000 random 
relative phase time-series of sample length (45 sec) and sample 
rate (50 Hz) were created to generate 1000 surrogate random 
relative phase distributions. The 950th largest value for each 
30° relative phase region—17.933%— was then employed as 
the statistical threshold value and corresponded to a 0.05 
significance level (Varlet & Richardson, 2015). In/antiphase 
coordination was deemed to have occurred for a given trial if 
the percentage of occurrence of relative phase angles for the 0- 
30°/150-180° relative phase region was greater than the 
17.933% threshold. In addition, intermittent (both) inphase and 
antiphase were noted to have occurred for a given trial if both 
the 0-30° and 150-180° regions were greater than the 17.933% 
threshold level. Table 1 provides a summary of the proportion 
of trials averaged across pairs that were statistically classified 
as inphase, antiphase, intermittent in/antiphase (both-phase), or 
no stable/other stable-phase relationship. 

Inspection of Table 1 reveals that over 75% of trials can be 
classified by dyads in both the novice control and EAA 
condition as performing in/antiphase rhythmic coordination in 
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order to complete the task. Additionally, the ratio between 
inphase and antiphase trials are consistent with previous 
research on visual rhythmic coordination (e.g., Schmidt at al., 
1990; Schmidt & O’Brien, 1997; Richardson et al., 2007), such 
that inphase coordination occurs more often than antiphase 
coordination. 

A reason for a larger proportion of trials classified as 
No/Other-Phase for dyads in the EAA condition can be due to 
the lack of the coupling term for the EAA used in the present 
experiment. Although the EAA was unable to adjust its 
frequency to establish a stable inphase/antiphase relationship 
with its partner, participants nevertheless fell into stable 
in/antiphase coordination for a majority of trials, giving 
evidence that even in conditions with unidirectional coupling, 
participants are able to adjust their movement to the EAA’s 
movements, allowing stable inphase/antiphase coordination to 
still be possible in most instances. However, if the EAA’s 
inability to adjust its frequency to that of its partner is a reason 
for the higher percentage of No/Other-Phase trials in the EAA 
condition, then it is expected that a larger average oscillatory 
frequency difference would be observed in the EAA condition 
overall, as opposed to the novice control condition. A 
participant’s oscillatory frequency was determined by first 
computing a spectral analysis on the z-scores of each 
participant’s radial angle data. To remind the reader, this data 
was filtered using a 4 th order Butterworth band-pass filter 
excluding frequencies under 0.375 Hz and above 5 Hz in order 
to only include oscillatory frequencies that are relevant to the 
COC behavior. The frequency with the most power for each 
participant and the absolute difference between each 
participant’s peak frequency was used for further analyses. The 
absolute peak frequency difference for each dyad was then 
submitted to a 2(Condition: control, EAA) x 2 (Sheep: 3, 7) 
between-subjects ANOVA. A significant main effect of 
condition was found, 7^(1,37) = 6.02, p = 0.02, rj 2 = 0.14, such 
that participants in the EAA condition had a greater frequency 
difference at their respective fundamental frequency (0.14 Hz) 
than those in the control condition (0.07 Hz), confirming the 
possibility that the greater frequency difference is responsible 
for the higher percentage of No/Other-Phase trials in the EAA 
condition. 

The final aim was to determine whether participants would 
remain in belief that their partner was a human for the entire 
duration of the experiment. Debriefing in the EAA condition 
indicated that 12 out of 18 participants (66.67%) believed that 
their partner was a human for the entire duration of the 
experiment. Participants were mixed in their views if their 
partner was as naive to the study as they were, or if their partner 
was given additional information about the task (or were a 
confederate). Some noted that their partner exhibited some 
“quirky” behavior or found it odd that the partner was able to 
discover the oscillatory strategy so quickly. Three of the 
remaining six had passing thoughts that their partner may in 
fact be a computer, while the remaining three had strong 
convictions that their partner was a computer for the entire 
duration of the experiment. In order to determine whether the 
difference in peak oscillation frequency is associated with 
believability that one’s partner was a computer, a Pearson’s 
correlation was conducted on participants in the EAA 


condition. The participants who had no thoughts that their 
partner could be non-human were placed in the Believed 
Human bin (12 individuals), while those participants who either 
had doubts or were confident that their partner was a computer 
were placed in the Thought Computer bin (6 individuals). The 
correlation was significant, r(16) = 0.60, p = 0.009, such that 
those who thought their partner was a human were associated 
with having lower peak frequency differences with their EAA 
partner than those who weren’t fully convinced their partner 
was human, suggesting stronger unidirectional coupling for 
those in the Believed Human bin. It is remains an open question 
as to what aspects of the EAA made participants question or 
reject that they were performing the task with a human partner. 

Discussion 

The research presented in this paper represents developing first 
pass at developing a minimal bio-inspired artificial model that 
can serve as a partner in a two-player herding task. The current 
model demonstrates its ability to complete the task alongside 
participants while also convincing 12 of the 18 participants to 
report believing that their partner was a human for the entire 
duration of the experiment. Future work involves adding a 
coupling term to the model in order to give it the ability to 
modulate its oscillatory frequency to match the frequency of its 
partner. This may further enhance the realism of the model. 
However, it is interesting to note that even without the model 
keeping track of the partner’s state, the unidirectional coupling 
of the participants still led to inphase, antiphase or both-phase 
coordination with the partner for over 75% of trials, showing 
the human tendency to form these stable phase relationships in 
the context of relatively complex task constraints and a 
dynamically changing environment. A reviewer expressed 
concern that the behavior the EAA exhibited is too prescribed 
to the specific environment presented in this paper, and may 
not be appropriate for environments where sheep need to be 
contained in a triangular or other shape containment regions. 
Future work will have to test this possibility. Another avenue 
of interest is to further elucidate the perceptual variables human 
agents attune to as they complete this task. The current model 
was built to mimic the behavioral dynamics observed in human 
hand movement data, without considering the role perception 
plays in the task. 

The EAA algorithm implemented in the herding task 
provides an example of how modeling human behavioral 
dynamics can guide the design of socially embedded artificial 
agents. Note, the proposed model does not aim for the most 
efficient or optimal model, but the one that best matches the 
relevant human behavioral dynamic. This model identifies the 
low dimensional dynamics that drive the coordination behavior 
required to accomplish the task with another agent and is 
designed to strike aa balance between providing an exact 
detailed description of every aspect of a specific human agent 
(i.e., white-box modeling) and a black box model that might be 
implemented in any number of ways. 
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Abstract 

Complex systems are challenging for students, especially 
younger students, to learn. In this paper, we argue that agent- 
based models (ABMs) of social insects provide an engaging 
and effective space for students to learn powerful ideas about 
complex systems. We designed a curricular unit called 
BeeSmart centering on ABMs of honeybees’ collective 
behavior. Preliminary results from an implementation at a high 
school showed that ABMs of social insects could be a 
promising approach to introduce complex systems to a younger 
audience. 


Introduction 

The study of complex systems in the past few decades has 
provided scientists with powerful frameworks to investigate 
phenomena that were difficult to understand through classic 
scientific methods (Bar-Yam, 1997; Jacobson & Wilensky 
2003). Through a complex systems lens, scientists see the 
behavior of a system at the macro level as emerging from the 
interactions of its individual elements (or agents) at the micro 
level (Epstein, 1999). The complex systems approaches are 
not only scientifically powerful, but also pedagogically 
important, because concepts and methods, such as emergence, 
self-organization, positive feedback loop, and agent-based 
modeling, have the potential for students to develop new 
intellectual horizons and new explanatory frameworks that cut 
across multiple disciplines (Jacobson & Wilensky 2003). 

Teaching students complex systems principles is both an 
opportunity and a challenge. Educational research on 
students’ learning about complex systems shows that they 
have significant difficulties in understanding complex 
systems: The aggregated properties of complex systems 
usually appear to be disconnected from their constituting 
agents’ details (Miller & Page, 2007), which makes 
emergence counterintuitive to students. Wilensky and Resnick 
(1999) describe students’ difficulties with complex systems as 
a “deterministic-centralized mindset” (aka DC mindset). 
Novices tend to see complex systems as a deterministic 
“clockwork” system, where elements of the system are 
interconnected like gears in clockwork (Jacobson, 2001). 
Novices also tend to think that for patterns to emerge, 
centralized leadership is necessary. 

Empirical studies suggest that Agent-Based Models 
(ABMs) can lower the threshold of learning complex systems 
and are effective in helping students overcome their DC 
mindset (e.g., Sengupta & Wilensky 2009). ABMs are 


computational models that simulate the actions and 
interactions of agents—individual parts of a complex 
system—and provide a view to assess the effects of these 
interactions on the system as a whole. NetLogo (Wilensky, 
1999) is a widely used agent-based modeling environment in 
both scientific research and in education. Much work has been 
done using NetLogo ABMs to teach existing curricular 
contents with a complex systems view (Levy & Wilensky 
2008; Sengupta & Wilensky, 2009), but few projects focus on 
explicitly teaching complex systems principles. 

ABMs of social insects—honeybees, ants, wasps, and 
termites—could be a productive way to teach complex 
systems principles. Social insects’ behavior demonstrates 
several core complex systems principles that are prevalent in 
natural and artificial complex systems, including positive 
feedback loops, randomness, interaction, and homogeneity. 
These principles can explain the apparent disconnection 
between the system and its parts: how the intelligence of 
swarms emerges from a collection of non-intelligent insects 
that follow a set of very simple rules. Learning about social 
insects’ behavior can help students make sense of complex 
systems principles in rich contexts and provide students a 
mental model to think with when explaining similar complex 
systems. 

What makes social insects more appropriate as an entry 
point to study complex systems is that they are both familiar 
and foreign to students. Students have learned many aspects 
about bees and ants from an early age, because these social 
insects have been a popular topic in children’s fairy tales, in 
school curricula, and even embedded in many different 
cultures and languages. Yet, very few people understand the 
details of insect colonies’ behavior, especially how the overall 
behavior of a colony can emerge from simple interactions 
between individual insects. ABMs of social insects allow 
students to draw prior knowledge about familiar topics to 
build new knowledge about complex systems. 

The BeeSmart Curricular Unit 

We designed a curricular unit called BeeSmart for high school 
students to learn complex systems principles. The unit is 
based on Seeley’s (2010) research findings about how 
honeybees pick their new hive site. The swarm’s decision¬ 
making process is best explained through a complex systems 
lens. The system consists of multiple agents that obey simple 
behavioral rules: if a scout bee discovers a potential hive site, 
she inspects it. Then she goes back to the swarm and reports 
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the location, distance, and quality—the suitability of the site is 
shown by doing a waggle dance. The waggle dance is bees’ 
medium of communication. Bees have the instinctual ability to 
assess the quality of potential hive sites. If the quality is high, 
bees dance enthusiastically for a long period of time to 
advertise it; if the quality is low, they make a few brief 
lackluster dances or do not dance at all. In this way, bees 
encode differential preference into the dances: the longer the 
dance, the better the site, and the stronger the signal. Because 
dances for higher quality sites are presented in the swarm for a 
longer time, neutral scouts—the observers in the swarm—are 
more likely to see such dances. When they see a dance, they 
become recruited and proceed to inspect the advertised hive 
site. Such simple behavior and interactions between dancers 
and observers at the micro level result in a positive feedback 
loop : signals advocating high quality hives are amplified. 
Usually, the hive site with the highest quality eventually 
receives all the support and wins out. 

In the BeeSmart Hive Finding model (See Figure 1), each 
bee obeys the rules described above. In addition, students can 
use sliders to manipulate environmental variables and bees’ 
behavior. The plots show the count of bees with different 
states and how their support converges over time. In addition 
to the model, we designed instructional materials including 
short readings on complex systems and bees, short videos of 
bees’ waggle dance, illustrations of bee’s hive-seeking 
environment in the real world, instructions of how to use the 
model, and questions about the model. 





Figure 1. BeeSmart Hive-Finding Model. 


School Implementation and Results 

We conducted a small-scale pilot study of the BeeSmart unit 
in a high school mathematical modeling class at an ethnically 
diverse suburb of Chicago. All 14 students in the class 
participated in the study. The intent of this study is not to 
generalize the findings from such a small sample. Instead, we 
would like to look closely at how students interact with 
BeeSmart and what is possible to achieve. 

Through the 5-day unit, students were highly engaged in 
honeybees’ hive-finding phenomenon and were motivated to 
answer the driving question “How does a swarm of 10,000 
bees pick the best potential hive site from many choices?” The 
14 students in the class worked in pairs, interacted with the 
model, went through the materials, asked questions, and 
participated in discussions. To test what complex systems 
principles students learned from the unit, we conducted pre- 
and post-tests asking the students to explain the hive-finding 
phenomenon. In addition, we asked students to interact with 


another NetLogo model about ants’ foraging phenomenon 
(Wilensky 1997) and explain how it works using principles 
they learned from the honeybees. Nine students completed 
both the pre- and post-tests. Four students were selected as 
focal students to represent different gender and prior 
knowledge with computational modeling. We interviewed the 
focal students both before and after their using the unit. 

In the pre-test, the students showed relatively high 
knowledge about bees. They knew that bees’ behavior was 
shaped by evolution, and one student even knew about 
“swarm mentality”. However, most students’ answers showed 
a deterministic mindset, such as the bees splitting into certain 
number of groups to find new hives. In the post-test, all 
students provided more sophisticated explanations with 
complex systems principles such as randomness, interaction, 
feedback loop, and homogeneity to explain the phenomena of 
both honeybees and ants. The four students interviewed 
elaborated on the mechanism of bees’ hive finding, 
incorporating randomness involved in insects’ movement and 
interactions to explain how the group level behavior emerged 
from simple rules and randomness. 

For future work, we will develop more curricula centering 
on ABMs of social insects for students to explore complex 
systems. We will also conduct studies at a larger scale to test 
the effectiveness of this approach. 
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Abstract 

In this paper we explore a novel perspective on 
surveillance robotics, which is based on a coordina¬ 
tion principle of honeybees, and on the integration 
of an autonomous telepresence robot in such system. 
Coordination principles, based on biological systems 
such as ant, bee and termite colonies, show several 
properties which are essential to multi-robot surveil¬ 
lance, including low computation load, robustness, 
scalability and adaptability. In this paper we aim to 
improve on the efficiency of such a robotic swarm by 
taking a human in the loop by means of a telepresence 
robot. The human operator controlling the telepres¬ 
ence robot will aim to speed up the convergence of 
the swarm. The experiments, which evaluate the pro¬ 
posed multi-robot coordination system both in simu¬ 
lation and on real robots, show how the telepresence 
robot substantially increases the efficiency of the pro¬ 
cess. 

Introduction 

In recent years there has been a rapidly growing inter¬ 
est in using teams of mobile robots for automatically 
surveilling environments of different types and com¬ 
plexity. This interest is mainly motivated by the broad 
spectrum of potential civilian, industrial, and mili¬ 
tary applications of multi-robot surveillance systems 
(Kuorilehto et al., 2005; Folgado et al., 2007). Exam¬ 
ples of such applications are the protection of safety- 
critical technical infrastructures, the safeguarding of 
country-borders, and the monitoring of high risk re¬ 
gions and danger zones which cannot be entered by 
humans in the case of a nuclear incident, a bio-hazard, 
or a military conflict. 

Triggered by this interest, today automated surveil¬ 
lance is a well-established topic in multi-robot re¬ 
search, which is considered to be of particular practi¬ 
cal relevance. Despite the remarkable progress made 
on this research topic so far, there is still a huge gap 
between theory and practice of multi-robot surveil¬ 
lance systems, and as a consequence there are still 
only very few on-field deployments. The reason for 
this is that many basic questions about coordination 
among mobile robots are not yet answered in a satis¬ 
factory way. 

In this paper a new approach on multi-robot 
surveillance systems is proposed, which is based on 


a bio-inspired coordination principle from swarm in¬ 
telligence and on the integration of an autonomous 
telepresence robot in such system. 

Natural entities, such as ant and termite colonies 
improve their collective performance by influenc¬ 
ing one another through local messages they deposit 
in their shared environment. In computer science, 
robotics and economics a number of computational 
variants have been developed, and it has been shown 
that they allow for very efficient distributed control 
and optimization in a variety of problem domains. 
For instance, recent work shows a strong potential 
in creating artificial systems that mimic insect be¬ 
haviour that can solve complex coordination tasks 
such as e.g., routing on the internet, mobile ad hoc 
network routing, robotic tasks, etc. (Lemmens and 
Tuyls, 2012; Dressier and Akan, 2010; Floreano and 
Mattiussi, 2008). 

Swarm optimisation algorithms, like ant colony op¬ 
timisation (Dorigo et al., 2006), rely on pheromone 
trails to mediate (indirect) communication between 
agents. These pheromones need to be deposited and 
sensed by agents while they decay over time. Though 
easy to simulate, artificial pheromones are hard to 
bring into real-life robotic applications. However, re¬ 
cently non-pheromone-based algorithms were devel¬ 
oped as well (Lemmens, 2011). Such algorithms are 
inspired by the foraging and nest-site selection be¬ 
haviour of (mainly) bees. In general, bees explore the 
environment in search for high quality food sources 
and once returned to the hive they start to dance in or¬ 
der to communicate the location of the source. Using 
this dance, bees recruit other colony members for a 
specific food source. The algorithm we used draw in¬ 
spiration from these insect behaviours with the goal to 
create intelligent systems for distributed coordination 
that can be deployed in real world settings. 

The key idea put forward in this paper is that a 
telepresence robot can improve upon the efficiency of 
such a swarm. Telepresence robotics is a form of tele¬ 
operation, namely the extension of a person’s sens¬ 
ing and manipulation capability to a remote location, 
in which a human operator act as a supervisor inter¬ 
mittently communicating information about goals and 
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actions relative to a specific task. The human operator 
will receive information about accomplishments, dif¬ 
ficulties and, as requested, raw sensory data, while the 
subordinate telepresence robot executes task based on 
information received from the human operator plus 
its own sensing and artificial intelligence (Sheridan, 
1989). In the approach we propose in this paper the 
human operator controlling the telepresence robot can 
observe the environment and will aim to steer the be¬ 
haviour of the swarm by means of direct communica¬ 
tion. 

In the following sections we introduce telepresence 
robotics and the biological background of our forag¬ 
ing approach. Then we show our experiments and dis¬ 
cuss the efficiency of the algorithm and the improve¬ 
ment obtained by integrating a telepresence robot in 
the system. 

Telepresence robotics 

Already more than 30 years ago, artificial intelligence 
pioneer Marvin Minsky (Minsky, 1980) laid out an 
ambitious plan calling for the development of ad¬ 
vanced teleoperated robotics systems that would re¬ 
sult in a remote-controlled economy. He coined the 
term “telepresence” to describe these systems, which 
in his futuristic vision would transform work, man¬ 
ufacturing, energy production, medicine and many 
other facets of modern life. Although the idea of 
a teleoperated robot for remote presence is not new, 
only recently telepresence robots become available 
to the broader public (Lazewatsky and Smart, 2011; 
Takayama et al., 2011; Tsui et al., 2011). Basically, 
telepresence robotics systems can be described as em¬ 
bodied video conferencing on wheels, providing a 
physical presence and independent mobility in addi¬ 
tion to communication, unlike other video conferenc¬ 
ing technologies, allowing the user to interact more 
naturally in the remote office environment. 

However, telepresence robots can be deployed in 
a wide range of application domains: the informal 
meeting scenario in offices, in hospitals to allow doc¬ 
tors to provide consultations from a distance (Tsui 
et al., 2011) or to pay a virtual visit when it is not 
possible to be present in person, or to give people with 
restricted mobility a new way to interact beyond their 
possibility. Furthermore, many work-sites are haz¬ 
ardous to human health or even survival. With telep¬ 
resence robotics it will be potentially possible to op¬ 
erate in dangerous environments without such risks. 

Adding a level of autonomy to a telepresence robot 
can greatly improve the experience of the user, as it 
reduces their cognitive load. This allows to focus 
more attention on the interaction and to the task and 
less on controlling the robot (Tsui et al., 2011). How¬ 
ever, it remains important for the operator to have 
control over the behaviour of the system. Indeed, as 
a telepresence robot is controlled from a remote lo¬ 
cation, precise control and feedback of the robot is 


required. One possible solution, assisted navigation, 
is investigated by Takayama et al. (2011). Adding 
more autonomy and integrating the findings of recent 
Al research into the platform can greatly increase the 
usability of these robots. 

Biological coordination 

A great deal of research in swarm intelligence is sit¬ 
uated in the area of bio-inspired computation; more 
precisely in the area that investigates algorithms that 
find inspiration from nature in order to develop novel 
computational models, often to solve coordination 
problems. Foraging is one of the coordination prob¬ 
lem in this domain. Essentially it consists of two 
sub-problems: path construction/planning and path 
exploitation/repair. The task of foraging consists of 
gathering objects out of the environment and return¬ 
ing them to a central point, most often the starting 
location. A commonly used method for solving for¬ 
aging problems focuses mainly on the behaviour of 
social insects such as ants and bees. 

Ants deposit pheromone on the path they take dur¬ 
ing travel. Using this trail, they are able to navigate 
toward the food location and communicate with other 
members of the colony, not directly but by accumulat¬ 
ing pheromone trails in the environment. Pheromone 
strength indicates the “fitness” of a trail but is not 
able to indicate direction, therefore an ant is not able 
to know a priori to which destination it is travelling. 
When a trail is strong enough, other ants are attracted 
and will follow it towards a destination which results 
in a reinforcement of the trail. This is known as an 
autocatalytic process: the more ants follow a trail, the 
more that trail becomes attractive for being followed. 
Short paths are reinforced more often over time and 
will eventually be preferred. This principle is used to 
address several problems, such as Routing Problem 
(Di Caro et al., 2005) and area coverage with robots 
(Wagner et al., 1999; Ranjbar-Sahraei et al., 2012). 

On the other hand, bees and desert ants do not use 
pheromones to navigate in unfamiliar environments. 
Their navigation mainly consists of Path Integration 
(PI). The PI vector represents the continuously up¬ 
dated knowledge of direction and distance and, as a 
consequence, bees are able to return to their starting 
point by choosing the direct route rather than their 
outbound trajectory. More precisely, when the path 
is unobstructed, the insect exploit previous search 
experience. However, when the path is obstructed, 
the insect has to fall back on other navigation strate¬ 
gies such as exploration (Collett and Collett, 2009). 
For recruitment bees communicate with other colony 
members by means of a waggling dance performed 
in the hive. The direction of the food source is read 
from the angle between the sun and the axis of a bee’s 
waggle segment on the vertical hive comb, while the 
duration of the waggle phase is a measure of the 
distance to the food and the “fitness” of a solution 
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Figure 1: MITRO interface and Turtlebots foraging supervised by a telerobot. 


(von Frisch, 1967). More precisely, depending upon 
the strength of the dance, more bees are attracted and 
follow the PI vector toward a destination. Further¬ 
more, the more bees follow a PI vector, the more that 
destination will be communicated and the more it will 
attracts other bees. Eventually, the best solution pre¬ 
vails. 

Transferring these principles to algorithms is the 
domain of computational swarm intelligence. Com¬ 
parisons of these algorithms (Lemmens et al., 2008) 
show that the bee-inspired mechanism is able to col¬ 
lect all the items in the environment faster than the 
ant-inspired mechanism in a relatively unobstructed 
environment. However, in an environment with 
more obstacles and/or dynamic environment, the bee- 
inspired mechanism is less adaptive. 

System & Approach 

The main idea of the proposed approach is to integrate 
swarm algorithms with telepresence robotics. We 
build on previously developed algorithms in swarm 
robotics, aiming to achieve a food foraging applica¬ 
tion in the real world guided by a telepresence robot 
that will be shepherding the swarm. 

Similar to the Path Integration principle, the robots 
in our swarm estimate their positions by integrat¬ 
ing information coming from the gyroscope and the 
wheel odometry. Using this, the robots can always 
compute the home vector (HV), and if the food lo¬ 
cation is seen, the path integration (PI) vector can be 
used to communicate the location to other robots. 

Therefore, no map of the environment needs to be 
built by the robots and the only common reference 
point that is needed for the correct communication of 
the food locations is the hive location, i.e. the HV. 


As a consequence, if the odometry is faulty, the 
robot might not find the hive or food location, and 
if this problem occurs the robots fall back in search 
mode. As soon as the hive or the food are seen again, 
the robots update their internal reference system. 

In contrast to the honeybees’s behaviour, we also 
allow communication outside of the hive, since it is 
very likely that the robots see each other under way. 
Additionally, there’s also a probability that the robots 
return to the hive after being in the search state for 
a long time, in order to increase the chance to meet 
another robot that might already be in foraging mode. 

This approach has been demonstrated to work rea¬ 
sonably well for small environments (Alers et al., 
2014a,b). However, there was no human supervision 
involved and also no simulation runs were performed 
to gain empirical insights about the performance of 
the swarm, i.e. how long it takes the swarm to con¬ 
verge on the food locations, what is the throughput of 
the system, etc. 

In this paper, we propose a novel approach to add 
a human shepherd to the system, which can supervise 
the swarming robots and help to enable faster conver¬ 
gence. The idea is that a human can interact with the 
swarm using a telepresence robot as a shepherd. The 
human operator can have more knowledge of the en¬ 
vironment, i.e. a map and a camera. After a food loca¬ 
tion has been found, the shepherd can steer the swarm 
towards that location or catch “lost” swarm robots. 

We implemented the approach using the Turtlebot 1 
platform as swarm robots and a custom-built telepres¬ 
ence robot MITRO (Alers et al., 2013) as shepherd. 
These platforms will be explained in more detail in 
the next subsections. 

x http://www.turtlebot.com 
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Figure 1 shows an overview of the system. On the 
left, the interface for the human controlling MITRO 
is shown. It gives an overview of the system’s sta¬ 
tus, allows the user to control the robot and shows the 
live video feed of the environment. Additionally, the 
internal view of the robot is shown, below the video 
feed. The reference frame is depicted as axis, and 
the two circles with arrows are the detected robots. 
On the right, a picture of the real-life experiment is 
shown, where MITRO is shepherding in the middle 
of several Turtlebots. 

Swarm robots 

As explained before, for the real world experiments, 
we use the Turtlebot platform. It has a laptop on 
board with a core-i3 CPU for computation, running 
the Robot Operating System (ROS) (Quigley et al., 
2009) framework. The robots are also equipped with 
a Kinect sensor and the RGBD information is used 
to detect and locate AR markers, see black and white 
markers in Figure 1. This sensor is also used for the 
obstacle detection, together with three bumpers lo¬ 
cated in front half of the robot. 

To enable visual robot-robot detection every 
Turtlebot has six unique markers, oriented in a way 
that at least one marker is always visible. To track 
and decode these markers we use the ROS wrapper 
of the ALVAR toolkit 2 . We use a customised bundle 
detection method to determine the center of the de¬ 
tected robot. Each marker in the bundle encodes the 
robot number and its location with respect to the cen¬ 
ter of the robot. This information is used to predict 
the position of the detected robot. Kalman filtering 
is also applied to get more stable and accurate esti¬ 
mation of the detected robots position, heading and 
speed. These parameters are also used for the colli¬ 
sion avoidance. 

Communication between Turtlebots is realised 
over wi-fi using a UDP connection to each Turtlebot. 
Even though global communication would be possi¬ 
ble, we limit the communication of each robot to its 
own channel and allow only communication after vi¬ 
sual detection of its peer. Therefore, the robots can 
communicate only with another robot if it is in close 
proximity. 

In order to avoid collisions between robots we 
rely on the marker detection to predict positions and 
speeds of the other robots. The obtained informa¬ 
tion could be used to efficiently compute a non¬ 
colliding speed vector (Claes et al., 2012). In con¬ 
trast to the previous approach, in which the robots 
avoided each other by using a global reference frame 
and broadcasting the positions to all robots via Wi¬ 
Fi, we adapted this method to only rely on the marker 
detection and the predictions using a Kalman filter. 
However, a few collisions still might occur due to 

2 http://www.wiki.ros.org/ar_track_ 
alvar 


the failed detection of other robots and additionally 
in such configurations in which the robots cannot see 
each other because of the field of view of the Kinect 
sensor. 

Telepresence robot 

In addiction to the Turtlebot platform we also use a 
custom-built telepresence robot, shown in the right 
panel of Figure 1 (Alers et al., 2013). The advantage 
of using a custom-built system over a commercial 
platform is the flexibility, extendibility and knowl¬ 
edge of the complete system, that for our purpose is 
crucial. 

The robot has a height of approximately 160 cm 
and is based on the Parallax Mobile Robot Base kit, 
which includes the base plate, powerful motors and 
6 inch wheels with pneumatic tires. The sensors in¬ 
clude a low-cost LIDAR, an Asus XTION PRO 3D 
sensor, sonar sensors, and two cameras (one pointing 
forward for conversations, one fish-eye camera point¬ 
ing downwards for driving). The robot is also running 
ROS. 

Since the robot is controlled from a remote lo¬ 
cation, we implemented low level autonomy on the 
robot in the form of assisted teleoperation. With 
assisted teleoperation the robot follows the steering 
commands of the operator except for a situation when 
there is a high chance of collision. This can easily oc¬ 
cur when the user is not experienced in navigating the 
robot, the network connection is delayed or an ob¬ 
stacle suddenly appears in front of the robot. Addi¬ 
tionally, the video feed can be switched from front- 
to down-facing, and is augmented with a projection 
of the expected navigational path. Furthermore, the 
robot is able to perform SLAM (simultaneous local¬ 
ization and mapping) to build a map of its environ¬ 
ment (Thrun et al., 2005); this map is used subse¬ 
quently for localization and autonomous navigation 
to a chosen destination, or back to his charging loca¬ 
tion, all to ease the remote operation. 

Experiments 

In our experiments the Turtlebots are performing a 
foraging task, starting at the hive (H) location and ran¬ 
domly exploring the unknown environment for a spe¬ 
cific food (F) location. The robots can also locate the 
food location by asking bypassing robots for a known 
food location, see Figure 2. When the source is found 
the Turtlebots start to exploit this source, driving from 
the food to the hive, where they drop the food, until 
the food is depleted or another source is found. The 
telepresence robot works as a “shepherd” sending rel¬ 
ative location information to the Turtlebots. 

We implemented our approach on the real robots as 
in simulation for getting additional statistics. In this 
experiment section we will describe the simulation re¬ 
sults, in the demonstration section the real-world set¬ 
ting is shown. Simulations are run in real time using 
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(a) (b) (c) 

Figure 2: Multi-Robot foraging, (a) All robots start at the hive (H) location, (b) Robots are exploring the unknown 
environment randomly. The left two robots have found the food (F) location and are foraging between the hive 
and the food location, (c) All robots have converged to foraging behaviour. 
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(a) 5x5 simulation environment with 1 
food source. 


(b) 10x10 simulation environment with 
2 food sources. 


(c) 10x10 L-shaped simulation envi¬ 
ronment with 3 food sources. 


Figure 3: The different simulation environments with the shepherding robot (black square), food sources (red 
square), and 9 robots located at the hive-location (blue square) 


Stage (Gerkey et al., 2003; Vaughan, 2008). We use 
simulated Turtlebots and a simple differential drive 
robot as telepresence robot. For the detections, mock- 
ups are written, so that the same state-machine is run 
on the real robots and in simulation. Having the sim¬ 
ulation setup allows us to investigate the system per¬ 
formance for different scenarios and using more rep¬ 
etitions than would be feasible in the real world. 

The main goal of our experiments is to compare 
the performance of the original bee-inspired algo¬ 
rithm with the newly proposed approach that has the 
telepresence robot in the system. We evaluate the 
proposed approach in simulation for 3 different en¬ 
vironments: 5x5 meters square shaped, 10x10 meters 
square shaped, and 10x10 meters L-shaped, shown in 
Figure 3. 

In the first case, we compare the performance of the 
swarm for different numbers of Turtlebots involved 
in the foraging task. We evaluate the throughput, the 
speed of convergence and the efficiency of the forag¬ 
ing process with and without the shepherding telep¬ 
resence robot. We also collect statistics showing the 
user effort, expressed as the number of times the user 
interfered (i.e. corrected a Turtlebot’s navigation), 
and the total distance driven by the telepresence robot. 
We repeat the same experiments in the 10x10 world 


and in the 10x10 L-shaped environment with 9 Turtle¬ 
bots, and for these cases we evaluate the convergence 
of the algorithm after moving the food to a differ¬ 
ent location, e.g. due to depletion of the first food 
source. Each experiment lasts until 50 food units have 
been transported from the source to the hive. Simi¬ 
larly, in the 10x10 environments, a food source be¬ 
comes “depleted” after 50 food units, upon which a 
new source becomes active. Every experiment is re¬ 
peated 10 times, and the results are averaged. 

Discussion 

Figure 4 shows the results of simulations in the 5x5 
world. In this relatively small environment the swarm 
will often converge without the interference of the 
telepresence robot, except for a few cases when the 
number of robots get too large for the environment, 
leading to collisions, and robots getting stuck. How¬ 
ever, through minimal user effort the shepherd still 
improves the efficiency of the process. 

Figure 4(a) shows the total time, in seconds, 
needed to complete the task (i.e., transport 50 food 
units), with the error bars representing the standard 
deviation intervals. We observe that the optimal 
swarm size is reached at 6 robots, both with and with¬ 
out shepherd. When the swarm size increases beyond 
this point, the small environment becomes too clut- 
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Figure 4: Results for the 5x5 world with different swarm size. 


tered as robots start colliding, hindering each other’s 
performance. The same trend can be observed when 
looking at the total throughput in Figure 4(b), mea¬ 
sured in units of food delivered per minute. Here 
again we see that the optimum is reached for a swarm 
size of 6 robots. Shepherding significantly improves 
the performance of the swarm in both cases. 

We also investigate the convergence performance 
of the system. In Figure 4(c) we show the time needed 
in seconds until the whole swarm is converged, mean¬ 
ing that all robots are aware of the food location and 
are continuously going back and forth between the 
hive and food location to transport food units. The 
figure shows that the time needed to converge stays 
more or less stable up to 5 robots, after which the 
environment becomes more cluttered, preventing the 
robots from converging quickly. Additionally, the 
converged state can be lost again, e.g. due to colli¬ 
sions, or robots driving in each others line of sight 
preventing them from relocating the food. In Fig¬ 
ure 4(d) we plot the percentage of experiment time 
during which the whole swarm is converged, and 
note that this value decreases approximately linearly 
with an increasing swarm size. The fact that robots 
may get in each other’s way can also be observed 
by looking at the total distance travelled (in meters) 
by the swarm during the course of one experiment, 
which increases exponentially with the swarm size 
(Figure 4(e)). This shows that even though a swarm 
of size 6 is optimal in both time and throughput, it is 


not necessarily the most efficient in terms of per robot 
performance. 

Finally, in Figure 4(f) we look at the effort required 
by the user to guide the swarm. The figure shows the 
distance travelled by the telepresence robot, as well as 
the number of interferences, i.e. the number of times 
that the user has corrected a swarm robot’s naviga¬ 
tion target. We can see that the required effort doesn’t 
necessarily grow with the number of swarm robots, 
indicating that the robots are able to relay the new in¬ 
formation among the swarm. 

We now move on to the larger environment. Table 
1 shows the results for the 10x10 world (with terms 
between parenthesis representing the standard devia¬ 
tion) with and without moving the food source. In 
both experiments the shepherd can significantly im¬ 
prove the performance of the system. In particular, 
after moving the food source the swarm without shep¬ 
herd takes more than twice as long to re-converge 
(third column in the table) as the swarm with shep¬ 
herd. Also note that when moving food, without shep¬ 
herd the swarm only fully re-converged in 3 out of 10 
runs, while with shepherd this happened in 9 out of 
10 runs. 

Results for the 10x10 L-shaped environment are 
shown in Table 2. Again, shepherding significantly 
improves the performance of the swarm with rela¬ 
tively limited effort. However, this task is clearly 
harder, as the food source is moved twice. A break¬ 
down of time to re-convergence, as well as the num- 
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Table 1: Results for the 10x10 world with and without moving the food source. 



Total time 

Time to 
conv. (1) 

Time to 
conv. (2) 

% of time 
converged 

Throughput 

Shepherd 

distance 

Times 

interfered 

Static 

W/o shepherd 

510.5 (56.4) 

304.7 (68.5) 

n/a 

38.5 (4.9) 

5.9 (0.6) 

n/a 

n/a 

With shepherd 

389.7 (12.9) 

191.9 (42.4) 

n/a 

47.7(10.1) 

7.7 (0.3) 

44.8 (12.6) 

8.0 (2.8) 

Moving food 

W/o shepherd 

922.6 (68.2) 

328.1 (75.1) 

316.9 (81.7) 

20.0 (5.6) 

6.5 (0.5) 

n/a 

n/a 

With shepherd 

677.7 (42.9) 

181.2 (51.3) 

146.7 (37.7) 

46.0(11.3) 

8.8 (0.5) 

88.9 (19.4) 

20.4 (3.2) 


Table 2: Results for the L-shaped world with and without shepherding. 



Total 

Throughput 

Distance 

Shepherd 

Times 


time 


traveled 

distance 

interfered 

W/o shepherd 

1068.8 (167.4) 

8.3 (1.0) 

3347.9 (466.6) 

n/a 

n/a 

With shepherd 

895.2 (73.4) 

10.1 (0.8) 

2984.5 (96.9) 

121.2(14.0) 

26.8 (6.6) 


Table 3: Break-down of convergence times in the L-shaped world with and without shepherding. Food is moved 
twice. Convergence times are listed for the three food locations, as well as the number of runs that did converge. 



Time to 

Time to 

Time to 

Nr. of 

Nr. of 

Nr. of 


conv. (1) 

conv. (2) 

conv. (3) 

conv. (1) 

conv. (2) 

conv. (3) 

W/o shepherd 

223.3 (57.0) 

235.6 (91.3) 

278.2 (60.7) 

8/10 

6/10 

4/10 

With shepherd 

188.9 (73.2) 

137.6 (39.0) 

201.4 (67.6) 

10/10 

10/10 

9/10 


ber of runs that fully re-converged, is given in Table 3. 
Both metrics are significantly improved by the shep¬ 
herd. We note that the first time the food is moved, 
the shepherd is able to make a big difference, as the 
distance between both food locations is easy to over¬ 
come. In contrast, the last food location lies in the 
opposite side of the L-shape, around the corner. This 
makes it harder for the swarm to re-locate the food, 
even with the help of the shepherd. 

Demonstration 

We have also undertaken a real-world experiment in 
which 5 Turtlebots are foraging in an unknown en¬ 
vironment. All the robots are initially located around 
the hive and they start to explore the environment ran¬ 
domly for the food location. An operator supervises 
the group using the MITRO telepresence robot. The 
user is able to send the food location information to 
individual Turtlebots, e.g. when they get stuck. A 
video showing this demonstration can be found on¬ 
line. 3 In this physical implementation the sheperd 
robot increases the efficiency of the foraging process 
and speed up the convergence of turtlebots, especially 
when the food is moved. 

Conclusion and further work 

We have proposed a new approach for swarm robotics 
systems, which is based on both the coordination 
principle of honeybees and on human-robot interac¬ 
tion through telepresence robotics. In order to vali- 

3 http://smartlab.csc.liv.ac.uk/ 
shepherding/ 


date the approach we performed swarm experiments, 
i.e., a foraging task in a unknown environment, both 
in simulation and in a situated environment. Our re¬ 
sults show that the telepresence robot, acting as a 
shepherd, can substantially increase the efficiency of 
the foraging process, especially in dynamic and com¬ 
plex scenarios, in which food sources change over 
time. Only a limited effort by the telepresence robot 
can already make a great difference in performance. 
In future work we aim to integrate an augmented 
telepresence robot in a swarm, allowing interaction 
between a human operator and the multi-robot sys¬ 
tem in a complex, potentially dangerous surveillance 
task. The human operator would be able to steer the 
behaviour of the swarm from a remote location by 
means of direct communication. 
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Abstract 

The Web has created new opportunities for interactive prob¬ 
lem solving and design by large groups. In the context of 
robotics, we have shown recently that a crowd of non-experts 
are capable of designing adaptive machines over the Web. 
However, determining the degree to which collective contri¬ 
bution plays a part in these tasks requires further investiga¬ 
tion. We hypothesize that there exist subtle yet measurable 
social dynamics that occur during the collaborative design of 
robots on the Web. To test this, we enabled a crowd to rapidly 
design and train simulated, web-embedded robots 1 . We com¬ 
pared the robots designed by a socially-interacting group of 
individuals to another group whose members were isolated 
from one another. We found that there exists a latent quality 
in the robots designed by the social group that was signif¬ 
icantly less prevalent in the robots designed by individuals 
working alone. Thus, there must exist synergies in the former 
group that facilitate this design task. We also show that this 
latent quantity correlates with the desired design outcome, 
which was fast forward locomotion. However, the quantity - 
when distilled into its component parts - is not more preva¬ 
lent in one group than another. This finding demonstrates that 
there are indeed traces left behind in the machines designed 
by the crowd that betray the social dynamics that gave rise to 
them. Demonstrating the existence of such quantities and the 
methodology for extracting them presents opportunities for 
crafting interfaces to magnify these synergies and thus im¬ 
prove collective design of robots over the web in particular, 
and crowd design activities in general. 

Introduction 

The Web has led to novel modes of social participation 
(DiMaggio et al. (2001)) and means for contribution to 
tasks that were previously limited to small groups of ex¬ 
perts (Khatib et al. (2011); Lintott et al. (2008); Gowers 
and Nielsen (2009)). New Web technologies, such as We- 
bGL 3D graphics and Web-embedded physics engines, have 
made the interactive design of intelligent machines possi¬ 
ble over the Web (Moore et al. (2014)). Additionally, col¬ 
lective intelligence methods (Quinn and Bederson (2011)) 
such as crowdsourcing (Howe (2006)), human computation 

! For a video overview of the experiment, see 
https://youtu.be/ODr-lacPKPQ 


(Von Ahn (2009)) and social computing (Wang et al. (2007); 
Parameswaran and Whinston (2007)) can be used to incor¬ 
porate contributions of large groups of non-experts - the 
‘crowd’ - into the design of robots on the Web (Wagy and 
Bongard (2014)). However, the ways that people involved 
in design and problem-solving tasks synergize remains an 
open question. 

Under certain circumstances, collectives have computa¬ 
tional abilities not readily available to the group members in 
isolation (Couzin (2007)). However, the ability of a group 
of people to socially coordinate problem solving efforts has 
limits in physical social interactions (Dunbar (1992)). These 
limitations have also been shown to persist in Web interac¬ 
tions (Gonqalves et al. (2011)). 

We seek to better understand whether a crowd of humans 
interacting socially through the Web can contribute non¬ 
destructive^, and potentially in a superadditive way (Page 
(2008)) to collective problem solving. Whether, and in what 
ways, constructive social computing phenomena arise in hu¬ 
man interactions on the Web remains to be seen. Previous 
studies (Khatib et al. (2011); Lintott et al. (2008); Gow¬ 
ers and Nielsen (2009)) have demonstrated that a crowd of 
individuals can complete problem-solving and design tasks 
while working in parallel. However, understanding the ways 
the crowd can collectively exhibit abilities that differ from 
that of an aggregate of individual contributions is under¬ 
explored. The present study addresses how the contribu¬ 
tions of the crowd as a social entity can be measured as dis¬ 
tinct from the contributions of isolated individuals working 
in parallel. 

We have shown that the crowd is capable of collectively 
designing adaptive machines on the Web (Wagy and Bon¬ 
gard (2014, 2015a)). Thus there is evidence that under some 
circumstances social synergy can arise in this domain. How¬ 
ever, past studies have not uncovered the imprint left behind 
on the crowd-generated designs that result from this synergy. 

We hypothesize that this social contribution is a measur¬ 
able quantity. If this quantity is indeed measurable, then it 
could be actively managed. If it constructively contributes 
to the task at hand, it can be enhanced. If it is destructive, 
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it could be actively suppressed. In this study, we seek to 
demonstrate whether or not the quantity is measurable; and 
if so, whether it is constructive or destructive with respect to 
the design of robots. Developing automated means for the 
discovery of crowd contributions is the first step to actively 
manage crowd participation towards productive, collective 
outcomes. 

Evidence indicates that humans may be biased, during de¬ 
sign, by exposure to the physical environment in which we 
are embodied. For example, people may favor symmetric 
robot designs as symmetry is ubiquitous in nature. If in¬ 
corporated into the physical characteristics of artificial or¬ 
ganisms, these biases facilitate design (Wagy and Bongard 
(2015a)). However, we still do not know whether these bi¬ 
ases arise individually or if the bias is reinforced by crowd 
behavior. If these biases are the result of social pressures, 
they could be actively enhanced or suppressed depending on 
whether they were beneficial to the desired design outcome. 

Previous work suggests there may exist as-yet undiscov¬ 
ered latent contributions from the crowd (Wagy and Bongard 
(2015b)). However, the methods suffer from a deficiency: 
they obscure underlying features of the crowd-generated de¬ 
signs that may contribute to a positive outcome. Addition¬ 
ally, we do not know whether design features were indeed 
influenced by social processes. This stands in contrast to the 
present work, in which we demonstrate a method for deriv¬ 
ing contributions as linear combinations of simple geometric 
values. 

In the present work, we distill a latent geometric fac¬ 
tor from robot designs using Singular Value Decomposition 
(SVD). We demonstrate that this value is more prevalent in a 
social design process than it is in designs resulting from the 
parallel effort of isolated individuals. Furthermore, we show 
that this latent factor has a beneficial effect on the objective 
of the design process. 

Methods 

In this study, we used a dataset of user-contributed robot de¬ 
signs (Wagy and Bongard (2015a)) 2 . Participants designed 
robots using an interactive tool available through their web- 
browser. When a user visited the study website, they were 
shown a 5 x 5 grid of dots. By clicking on a dot and drag¬ 
ging to another dot, they could draw line segments between 
dots (e.g., Fig. 1). Fine segments were only allowed be¬ 
tween horizontally- or vertically-adjacent dots to constrain 
users from crossing segments in their designs. In order to 
enforce this constraint, the closest set of adjacent lines seg¬ 
ments that approximate the diagonal line drawn by the user 
was shown as the user dragged lines between dots. For ex¬ 
ample, a line drawn from the top left dot to the bottom right 
dot was approximated by a zig-zag formation of smaller, ad¬ 
jacent segments between the top left and bottom right dots. 

2 For code and data used in this experiment, see 
https://github.com/mwagyuvm/dotbot-latent-social 
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Figure 1: Grid of dots for designing robot bodies by the 
crowd. An example design is shown drawn on the grid. 
Users could click on a dot and drag to another dot. Fines 
were only allowed between adjacent dots. If a user dragged 
to a dot that was not adjacent, the path of adjacent lines that 
best approximated the dragged path was generated as the de¬ 
signer drew the line. 


When a participant was finished designing a robot, she 
could click a ”GO” button, which launched an instance of 
her design as a 3D robot in a physics simulation engine 
within her browser. The simulation contained only the robot 
and an infinite, flat ground upon which the robot could walk. 

Fine segments drawn in the grid of dots were translated 
into the physics simulation as 3D rectangular parallelepiped 
robot segments. Each dot that was adjacent to at least one 
line segment was invoked as a 3D cube in the physics sim¬ 
ulation. Segments adjacent to a cube were connected to the 
respective cube with a hinge joint along the axis at the mid¬ 
point between adjoining cube and segment faces and in par¬ 
allel with the ground and these body faces. In this way, 
the robot was able to push in the direction of the ground. 
However, as segments flexed and the robot body moved, the 
robot configuration did allow for sweeping motions across 
the ground by its members. 

Every joint was actuated with a sinusoidal, displacement- 
controlled signal. All sinusoidal signals driving the hinge 
joints swept the same angle (±45°) at the same frequency 
(1.5 Hz). However, the phase of the signal could take on one 
of two possible values: 0° or 180°. A hill-climber search al¬ 
gorithm was used to define which of these phase values was 
assigned to each of the hinges in the robot created by the 
user. We will use the term phase configuration to refer to a 
single assignment of phase values to hinge joints for a par¬ 
ticular robot body. A separate hill-climber algorithm was 
maintained for each robot design and for each group. Thus 
the first instance of a particular robot morphology within a 
group was assigned a random phase configuration. When 
that same user or another participant in the same group drew 
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and simulated an identical robot morphology, the robot was 
assigned a slightly altered version of the original phase con¬ 
figuration. Thus, each time a user clicked GO, they con¬ 
tributed one iteration of the hill-climber algorithm for a par¬ 
ticular robot body. 

Each robot design was simulated for 15 seconds of 
physics simulation time (for examples of robots in the web- 
embedded physics simulation, see Fig. 2). The distance that 
the robot traveled in these 15 seconds was recorded in a 
database along with the adjacency matrix that defined the 
connections. 



Figure 2: Robots were simulated in a web-embedded 
physics simulator for 15 seconds. Each line that a user drew 
was translated into a segment in the robot and each dot that 
was adjacent to a line was translated into the physics sim¬ 
ulation as a cube. The cubes and segments were attached 
with a hinge joint, which was actuated by a motor with 
displacement-controlled sinusoidal signal. Red dots indicate 
the starting point from which the robot started in the simu¬ 
lation. 

Participants were asked to design a robot that could move 
as far as possible across the flat plane and within the allotted 
15 seconds of simulation time. However, participants were 
unpaid volunteers so they were free to use the tool however 
they desired. 

Participants were placed into either a control group or ex¬ 
perimental group with an equal probability of being placed 
into either group. Participant IP addresses were recorded so 
that if they were to return to the site to design more robots, 
they would be placed in the same control or experimental 
group. In a panel at the top of the study website, the ex¬ 
perimental group was shown 13 randomly chosen (2D) de¬ 
signs created by other users in their group. We refer to this 
group of robot designers as the social group because they 


were given the chance to utilize other users’ designs if they 
so desired. Every time a user returned to the site, they would 
be shown a potentially new random selection of 13 designs 
created by other users that were placed in the experimental 
group. The control group, which we will refer to as the inde¬ 
pendent group , was shown only their own past designs. Thus 
the user interfaces for the independent and social groups 
were identical apart from the content of the panels showing 
historical designs at the top of the site. 

When the crowdsourced portion of the experiment was 
complete, we used the collected robot designs to compare 
the design preferences for users in the social and indepen¬ 
dent groups. 

Since robot designs consisted of a series of points and 
edges joining these points, we were able to compute network 
metrics on the resulting dataset and derive a set of simple 
geometric measures from each robot designed by the crowd. 
These geometric measures included minimum, average and 
maximum degree measures; maximal matching; length of 
the shortest path; node connectivity; number of legs; num¬ 
ber of segments; radius; transitivity; number of cliques; in¬ 
dicators of bipartiteness, regularity, whether the network is 
a tree and biconnectedness; and symmetry (computed as the 
maximum proportion of segments that are matched with an¬ 
other segment across either the horizontal, vertical or one of 
two diagonal axes of symmetry in the 2D design plane). 

We then distilled this set of computed geometric measures 
for each robot morphology into just one representative value 
of its geometric features. We did this by using the Singular 
Value Decomposition (SVD) Rajaraman et al. (2012) dimen¬ 
sionality reduction technique to factorize the matrix of all 
descriptive geometric features into component singular val¬ 
ues and matrices. We then used only the first singular value 
of the decomposition to obtain a one-dimensional represen¬ 
tation of the geometric robot feature-space, which allowed 
us to represent each design by a single number that reflected 
all computed geometric features into one quantity. This 
reduced-dimensional representation of robot morphologies 
will henceforth be referred to as the latent factor representa¬ 
tion of a particular robot’s morphology, or the latent factor 
in short. 

Results 

A sample of designs created by participants can be seen in 
Fig. 3. The T-shaped robot design in the center was the high¬ 
est performing design overall (the best distance it was able 
to achieve was approximately 32 meters). 

Summaries of team contribution are indicated in Table 1. 

The distribution of distances achieved by the social group 
and that of the independent group can be seen in Fig. 4. The 
social group achieved higher distances than the independent 
group at a level that was statistically different (p < 0.0001; 
Kolmogorov-Smirnov test, Dm 0.14151). 

The minimum, mean and maximum values of the latent 
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Figure 3: A sample of robot designs by participants. The highest performing robot (with regard to distance-traveled) was the 
T-shaped robot in the center of the designs shown. 



Independent 

Social 

Total Contributions 

2825 

2984 

Total Unique Designs 

1245 

1136 

Number of Participants 

364 

398 


Table 1: Social and independent group contributions. 



Figure 4: Deciles of distances achieved by robot designs in 
the group working together (Social) and those achieved by 
individuals working in isolation (Independent). The median 
distance value is indicated by a black diamond. 

factor and the quantities that are used to compute it are 
shown in Table 2. 

The dimensionality reduction technique resulted in two 
non-negligible components (coefficients > 0.00001) that 
contributed to the latent factor. These contributions were the 
robot’s number of legs (coefficient of 0.01) and the symmetry 
of the robot body (coefficient of 0.99). Values for symme¬ 
try could range from a minimum of no symmetry (0.0) to a 
maximum value of 1.0, indicating perfect symmetry about 
at least one of the horizontal, vertical or diagonal reflective 



Independent 

(min/mean/max) 

Social 

(min/mean/max) 

Number of Limbs 

(0/0.734/12) 

(0/0.656/12) 

Symmetry 

(0/0.882/1.00) 

(0/0.945/1.00) 

Latent Factor 

(0/0.880/1.10) 

(0/0.942/1.11) 


Table 2: Latent factor range and ranges of values used to 
compute latent factor in designs by both groups. 

axes. The minimum number of legs found in a design was 
0, representing designs whose segments were all connected 
at both ends to up a maximum value of 12 legs. 

There was not a significant difference in the symme¬ 
try of the designs created by the independent group com¬ 
pared to those created by the social group (p = 0.132; 
Kolmogorov-Smirnov test, D = 0.047829), nor was there 
a significant difference between the distribution of the num¬ 
ber of legs in designs created by the social group and the 
independent group (p = 1.0; Kolmogorov-Smirnov test, 
D = 0.0082534). However, there was a significant dif¬ 
ference in the values for latent factor when comparing the 
distribution of designs created by the independent group 
and the social group (p < 0.01; Kolmogorov-Smirnov test, 
D = 0.068714). 

The 5 designs with the highest latent factor value can be 
seen in Fig. 5. 

Discussion 

The only difference between the social group and the indi¬ 
vidual group was that the social group was able to see de¬ 
signs created by others in their group. Thus, if a quantity 
derived from the designs was more prevalent in the social 
group versus the independent group, then this quantity was 
the result of social dynamics. 

We derived a measurable latent factor from designs cre¬ 
ated by each group. We found that this latent value was sig¬ 
nificantly more prevalent in the social group than that in the 
individual group at a statistically significant level. Thus ex¬ 
posure to others’ designs resulted in increased prevalence of 
this factor. 
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Latent Factor: 1.11 
Symmetry: 1.0 
# Legs: 12 


Latent Factor: 1.11 
Symmetry: 1.0 
# Legs: 12 


Latent Factor: 1.10 
Symmetry: 1.0 
# Legs: 11 


Latent Factor: 1.10 
Symmetry: 1.0 
# Legs: 11 


Latent Factor: 1.09 
Symmetry: 1.0 
# Legs: 10 


Figure 5: Crowd designs with the highest latent factor val¬ 
ues and corresponding symmetry and number of legs calcu¬ 
lations. 

A synergy is, by definition, a collective outcome that is 
greater than the sum of individual contributions. Demon¬ 
stration of a synergy is not predicated on the collective out¬ 
come being constructive with respect to the objective of the 
efforts. However, there is evidence that the latent factor that 
we have distilled contributes constructively to robot design. 

This latent, socially-transmitted value incorporated mor¬ 
phological symmetry, which has been shown to be beneficial 
for robot locomotion (Bongard and Paul (2000)). While the 
incorporation of symmetry in user designs alone was not sig¬ 
nificantly different in the social versus independent groups, 
the associated p-value (p = 0.132) indicates a trend that 
those in the social group may have favored symmetry in de¬ 
signs over those working independently. It is only by the 
incorporation of the number of legs in the design that dif¬ 
ferentiates the social tendency of design with the isolated 
design tendency. 

It appears that there is a latent quantity that - through ex¬ 
posure to designs by other users - is increasing, consciously 
or otherwise in the social participants. And the social group 
does indeed design robots that, on average, outperform those 
designs by participants in the individual group. However, 
we cannot say with confidence that it is this latent value that 
leads to the improved performance. There may be other, 
undiscovered factors that lead to the superior performance 
by the social group. However, we did find that there is a 
positive correlation between the discovered latent factor and 


the distance that a robot is able to travel (Pearson correlation 
= 0.2546499). Thus, while we cannot say for certain that 
we found the latent factor that contributed to the success of 
the social group, we can say that - through the distillation of 
design decisions by a large group of non-expert contributors 
- we found a quantity that correlates with the desired prob¬ 
lem outcome. And that this quantity is, in part, corroborated 
by findings scientific literature (Bongard and Paul (2000)) 
on locomotion unbeknownst to the non-expert participants 
in the study. 

The social group designed robots that were capable of 
moving significantly farther than those designed by individ¬ 
uals in isolation. As can be seen in Fig. 4, the distribution 
of distances achieved the social group’s robots have a higher 
median value than the independent group. The distribution 
of distances achieved by the social group has a much higher 
spread in the upper deciles than the independent group. Here 
we are reporting the overall ability of the participants in the 
social group to the ability of the independent group to design 
robot bodies in tandem with their control strategy. This is in 
contrast to the performance of the robot morphologies that 
are independent of their control strategies as described in 
(Wagy and Bongard (2015a)). Thus users were able to work 
together to build robot body/brain combinations that outper¬ 
formed the body/brain combinations of those that worked 
alone. This need not have been the case: well-known group 
pathologies such as groupthink (Janis (1972)) could have re¬ 
sulted in an echo-chamber effect. Users working together 
could have missed promising design possibilities due to a 
focus on limited regions of the space of possible designs. 
Also, the number of participants working collectively in this 
study (398) far exceeds the number of participants consid¬ 
ered optimal for effective social interactions Dunbar (1992). 

However, there could be a trivial reason for the social 
group’s superior design performance. As described in the 
Methods section, a hill-climber was assigned to each robot 
morphology; and each hill-climber instance was shared be¬ 
tween members in a group. It is possible that the designs that 
received the most attention by the social group simply were 
given more attempts at finding a good control strategy by de¬ 
voting more hill-climber iterations to the search for a good 
controller. Thus groupthink may have led the social group to 
focus on a single design’s control strategy rather than con¬ 
centrating efforts on finding an optimal morphology. If this 
had been the case, we would have seen that the designs with 
the most hill-climber iterations are also the highest perform¬ 
ing designs. However, this is not what we see in the results. 
Referring to Fig. 6, we do see that several designs in the 
social group benefited from many more hill-climber itera¬ 
tions than the independent group. But these designs were 
not among the high-performing instances. In contrast, many 
of the highest-performing designs are those that received 10 
or less hill-climber iterations. This is much less than some 
designs, which received in excess of 100 iterations devoted 
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to finding the best controller. Thus we can see that the robot 
morphology - not just the number of hill-climber iterations 
- was an important component of the robot’s performance. 
Therefore we believe that the social group did not rely solely 
on the focused search efforts for the best control strategy on 
a limited groups of designs. 


15- 
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Figure 6: Mean distance achieved by each robot design com¬ 
pared to the number of hill-climber iterations that that de¬ 
sign received. A number of designs in the social group did 
receive substantially more iterations than those in the inde¬ 
pendent group, but they were not high-performing designs. 
In contrast, some of the highest-performing designs received 
very few iterations. 

We claim here that the latent factor discovered is an in¬ 
dicator of synergy in the social group. This value is signifi¬ 
cantly more prevalent in the social group versus independent 
group. And the only difference between the two groups was 
the opportunity to synergize. Therefore the only ways that 
the latent factor could arise is through synergy or by chance. 
It is unlikely that the value arose by chance, as indicated by 
the statistical tests performed. However, it has been shown 
that crowdsourced work follows a heavy-tailed distribution 
(Swain et al. (2015)). Most participants contribute very little 
to a crowdsourced activity and a single user or small group 
of users contributes vastly more than others. This study fol¬ 
lowed that same trend (see Figure 7). Therefore, it is pos¬ 
sible that by chance the top contributing user in the social 
group favored the latent factor and biased the overall group 
towards prevalence of this value. 

We investigated whether the designs by the top- 
contributing participant in the social group biased the com¬ 
parison of the latent factor between the groups. We removed 
the social group member that contributed the most designs 
from the social group and compared the mean latent factor 


values for users in each group. Even without this top con¬ 
tributor, we found significantly more of the latent factor in 
the social group versus the independent group (p = 0.011 
Kolmogorov-Smirnoff test). 



Contribution Index 

Mean Design Performance i 5 io i 


Figure 7: Contributions to this study follow a heavy-tail dis¬ 
tribution. The number of designs contributed by most users 
was small whereas the number of designs contributed by one 
very enthusiastic user was very high. 

Note that the designs with the highest scores of the latent 
factor are those that maximize both the symmetry measure 
- a maximum of 1.0 - and the number of legs. The maxi¬ 
mum number of legs found in designs created by the crowd 
was 12. 12 legs is also the maximum number of legs pos¬ 
sible when designing single-component, connected robots 
in this design space. In fact, 8 unique members of the so¬ 
cial group drew designs that maximized the number of legs 
quantity; whereas no users in the individual group drew de¬ 
signs that maximized this quantity. The maximum number 
of legs found in the independent group was 11. Thus, the de¬ 
signs that maximize the latent factor (Fig. 5) are those with 
the most legs possible. However, despite having the high¬ 
est possible symmetry score, the top performing design of 
all designs only has 3 legs (the largest, T-shaped design in 
Fig. 3). We cannot say that the best design was the result 
of this increased latent factor in the social group. Whether 
the prominence of the latent factor influenced the creation 
of the best designs requires further exploration. Future work 
will address the progression of features that lead to specific, 
optimal outcomes such as the T-shaped robot in Fig. 3. 

We can get a sense of the participation of users that con¬ 
tributed to the top designs by looking at Table 4. Repeating 
numbers across rows in the table indicate that the same user 
contributed multiple hill-climber runs to a particular design. 
Two patterns can be seen in these top 10 designs. The top 
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design (Design Rank =1) ranking design was created by a 
user that only contributed 9 runs total to the overall experi¬ 
ment. Similarly, the first five contributions of the sixth and 
eighth ranking designs were by the same user. However, in 
almost all other designs in these top 10 designs (with just 
one exception: the seventh ranked design), we see that the 
designs were created by a user with lower numbers of total 
contributions and then picked up by users who contributed 
more overall to the experiment. For example, the second 
best design was created by a user who contributed just 4 runs 
to the experiment. Then a user that contributed 6 runs picked 
up the design and then a user that devoted 25 runs drew that 
same robot to contribute a run to the hill-climber. This pat¬ 
tern of contributions could be the result of various social 
dynamics. It may be that a design that is initially promising 
may have caught the attention of those users that are more 
participatory. Or this could be a general pattern of behav¬ 
ior for any design created socially. However, if we examine 
the same table for the worst-performing designs (Table 3 the 
pattern is not as prevalent. But we do see this pattern in the 
eighth and ninth worst designs. Future work will investi¬ 
gate these social dynamics that contribute to specific robot 
designs. 


Rank 

User 1 

User 2 

User 3 

User 4 

User 5 

1 

9 

9 

9 

9 

9 

2 

4 

6 

25 

25 

25 

3 

3 

17 

17 

88 

88 

4 

6 

29 

52 

52 

52 

5 

52 

52 

52 

52 

316 

6 

28 

28 

28 

28 

28 

7 

17 

14 

14 

14 

14 

8 

11 

11 

11 

11 

11 

9 

8 

8 

8 

83 

79 

10 

10 

13 

75 

75 

75 


Table 3: Total number of runs contributed by each of the first 
5 users to work on the top ranking designs. Repeating num¬ 
bers indicate the same user contributing runs to the design. 
The best design is at the top and the tenth best design is at 
the bottom. 

Additionally, the mean value of this latent factor over 
the course of time (as measured in number of designs con¬ 
tributed by each group) can be seen in Fig. 8. This figure 
strongly suggests a slow increase in the social group’s us¬ 
age of the latent factor, whereas it appears that this value 
fluctuates over time for the independent group. More data is 
required to evaluate whether this trend continues in order to 
verify that this is indeed a trend; but the steady increase of 
this quantity in the social group is suggestive. 

Conclusion and Future Work 

In this study, anonymous Web participants designed robots 
in their browsers. Our hypothesis was that there exists a 


Rank 

User 1 

User 2 

User 3 

User 4 

User 5 

10 

133 

133 

133 

133 

133 

9 

38 

148 

148 

148 

148 

8 

23 

79 

79 

79 

79 

7 

7 

4 

6 

6 

15 

6 

35 

35 

35 

35 

35 

5 

50 

50 

50 

50 

50 

4 

23 

23 

23 

10 

252 

3 

52 

52 

52 

52 

52 

2 

46 

46 

46 

46 

46 

1 

41 

41 

41 

41 

41 


Table 4: Total number of runs contributed by each of the 
first 5 users to work on the bottom ranking designs. The 
worst design is at the bottom and the tenth worst-performing 
design is at the top. 



Independent Social 


Figure 8: Mean value of latent factor over time (index 
over designs created by participants). Error bars indicate 
95% confidence levels. Participants creating designs so¬ 
cially maintained a near constant increase in the latent value, 
whereas participants working independently varied their in¬ 
corporation of this value in their designs. 

measurable quantity derived from the social design of adap¬ 
tive machines. Using the designs created by study partici¬ 
pants, we derived a latent geometric quantity through a di¬ 
mensionality reduction technique. We compared the preva¬ 
lence of this derived latent value in the social and individual 
groups. We found that this value was more prominent in a 
group of participants working socially than those working in 
isolation. We observed that this value followed an increas¬ 
ing trend in the social group designs whereas the quantity 
fluctuated in the designs created by the independent group. 

The latent value that we uncovered is derived from the 
symmetry and number of limbs in robot designs. That this 
value was derived in part from symmetry corroborates pre¬ 
vious work on social design of adaptive machines over the 
Web. The latent value was shown to have a positive corre¬ 
lation with the desired outcome in robot designs, which was 
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that of fast forward locomotion. Thus, the social quantity 
may have played a role in the superior outcome in social de¬ 
sign of robots. Further work is required to demonstrate how 
such aggregate social quantities influence specific designs, 
such as those that are among top performers. 

Deriving such measurable social quantities can be use¬ 
ful for their incorporation into automated methods. In fu¬ 
ture work, we will use crowdseeding (Wagy and Bongard 
(2015b)) to enable machine design of robots by incorporat¬ 
ing the SVD-derived objective into a design objective. 

Additionally, we would like to utilize the methodology 
introduced here to analyze crowd designs in order to provide 
immediate feedback to the crowd. By providing feedback to 
the crowd during the social design process, we may be able 
to capitalize on crowd preferences and biases in real-time to 
thus accelerate the search process. 

This technique may be useful more broadly in social and 
human computing. But understanding whether social design 
preferences are destructive or constructive in other domains 
is critical to determining the general utility of such methods. 
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Research into perceptual and behavioural adaptations to 
radical disruptions of the agent-environment coupling has 
long been an interest of dynamical and embodied agent-based 
modeling. Work in evolutionary robotics has produced a 
series of minimal models of homeostatic adaptation to 
inversions of the sensory field (e.g., Di Paolo, 2000a, Iizuka 
and Maeda, 2013) as broad replications of classical 
experiments by Kohler (1964) and others on adaptation to 
wearing goggles that invert or distort vision. 

While these models draw inspiration from Kohler’s 
experiments, their sensorimotor instantiation is rather 
minimal, typically involving two point photoreceptors and a 
point source of light. This sensorimotor space is arguably a 
better match for sensorimotor engagements in the auditory, 
rather than the visual, modality (e.g., Di Paolo, 2000b). 

However, unlike the striking visuo-motor adaptation shown 
by Kohler’s participants, empirical evidence for similar kinds 
of adaptation to radical disruptions of the auditory space, such 
as inversion of the left and right directions, has not yet been 
found. This is in part due to the technical difficulties 
involved, which make these studies rather scarce, but also 
possibly because of the kind of sensorimotor relations and 
plasticity at play in auditory perceptual learning. 

Here we report on the development and a series of 
preliminary studies regarding the role of activity and passivity 
in human adaptation to wearing a left-right auditory inversion 
device, or pseudophone. 

From an enactive perspective, perception is intimately 
related to action (Varela et al 1991). Perception is constituted 
by the active skillful use of the regularities that govern the 
ongoing coupling between motor and sensory activity, also 
known as sensorimotor contingencies (SMCs) (O’Regan and 
Noe, 2001). Perceptual adaptation, in this context, involves 
the ongoing equilibration of sensorimotor schemes (Di Paolo 
et al 2014). Sensorimotor disruptions present the perceiver 
with radical obstacles and lacunae as her sensorimotor skills 
suddenly cease to make sense. The process of re-learning 
sensorimotor schemes for particular actions provides rich 
information regarding the complex coordinations involved, 
which because they are nearly maximally equilibrated, are not 
always obvious during normal sensorimotor engagements. 

One of the key recurring elements found in empirical and 
modeling studies of radical adaptation is the need for self¬ 
generated activity by the agent. New sensorimotor schemes 
cannot be learned unless the agent engages the world actively 
and confronts various breakdowns and tries to recover from 
them. This is clear in Kohler’s experiments. During prolonged 
wearing for left-right inversion goggles, participants initially 


feel baffled and their movements induce unexpected visual 
changes as if aspects of the visual scene expected to remain 
static moved without correspondence to the action. This is due 
to the sign-inversion introduced by the goggles such that 
extra-retinal proprioceptive signals cease to compensate the 
retinal flows provoked by the static background of a visual 
scene as the head turns right or left. As participants gradually 
adapt in specific contexts their movements cease to produce 
the sensation of background instability. 

Interestingly, in the case of inversion of auditory space 
using a pseudophone perceptual instability also appears when 
participants move. This was shown in a study of localization 
of sounds with this device (Ohtsubo, et al., 1982). The 
authors, by allowing participants move their head during a 
brief activation of the sound sources, found that the variability 
in their responses increased significantly. Participants 
reported that when they moved they felt that (static) sound 
sources also did. Young (1928) and Hofman et al. (2002) also 
commented a similar effect. Beyond these few studies there 
has not been much systematic analysis of the motor strategies 
used by participants when wearing these devices. 

To further investigate movement-induced de-stabilitzation, 
we evaluate the experience and performance of participants 
equipped with a pseudophone in two sound localization tasks. 
Our device works in two modes, with or without inversion of 
sound signals (figure 1A). This allows us to compare 
experimental performance under each condition. All testing 
protocols are conducted without visual cues in a sound proof 
room. 

First, in Task 1 we analyze different situations where 
variations to the sound stimulation are the same, but in one 
case this variation is obtained passively and in the other 
actively. In the passive motion condition the participant 
remains still and the sound source moves. In the active motion 
condition the source remains still and the participant moves. 
Figure IB shows an example of one of the pairs of 
movements evaluated. The stimuli were metronome clicks 
(160 beats per minute). 

The participant’s experience of the same changes in sound 
stimulation was different depending on the pseudophone 
mode. Without inversion there are no reported differences 
between the active and passive conditions. Participants 
expressed they were able to recognize the position of the 
sound source. In the ’’inversion” mode the two conditions 
result in different experiences. In the passive condition 
perceptual experience is similar to the non-inversion case, 
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except that the source is perceived as moving in the opposite 
direction. In the active condition participants experienced 
strange sensations. For instance, in the case of rotational 
movement of the head (figure IB) participants reported that 
sound changes were more abrupt, faster and less predictable 
than in the passive motion condition. 



Figure 1. (A) Participant wearing the Pseudophone. (B) 
Diagram of Task 1 with passive and active movements. (C) 
Diagram of sound localization task (Task 2). 


These qualitative differences in the case of auditory 
inversion reveal a proprioceptive component in spatial 
hearing. If in the passive condition a rotational movement of 
the source from left to right is perceived as a movement from 
right to left, in the active condition the head movement is 
added to the apparent movement of pure auditory sensations 
instead of subtracted from it, which apparently gives the 
feeling of a source moving at a greater speed. 

In Task 2, we also investigated passive vs. dynamical 
source localization. In this task, participants remained seated 
in front of 7 speakers in the horizontal plane (figure 1C). 
When a source is activated the participant has to turn her head 
and face it. In the passive condition, participants listen to a 
very short sound, a 250 ms pulse of pink noise, from one of 
the sources and then have to face the source; and in a dynamic 
condition, the sound source remains activate with a burst of 
pink noise until the participant locate it. We measure the 
accuracy of the location responses and pattern of movements. 
Phenomenal data is also recorded. 

Without inversion participants perform accurately in both 
conditions. In the "inversion" mode in the passive condition 
responses are mirrored in the opposite hemifield to the sound 
source, e.g. if the sound source is +45° the participant moves 
close to -45 °. By contrast, in the dynamic condition, when 
participants could freely explore the sound environment, hit 
levels are similar to those obtained in the mode "without 
inversion". In this condition, participants spontaneously 
develop sensorimotor strategies that help them resolve the 
conflictive information. To respond, the strategies used at the 
beginning involve large amplitude movements, sweeping all 
the frontal plane and then a series of smaller movements to 


refine the position. 

Participants mention that their movements provoked them 
the feeling that the sound source also moved in unexpected 
ways. There produce comments like, when I turn the head to 
face the sound, it escapes very quickly. This sensation 
happened when the participant turned the head toward the 
hemifield opposite the sound source. Conversely, movement 
toward the source causes the source to “appear” and 
“disappear” suddenly from the front of the participant and 
sound intensity varies rapidly. Participants commonly report 
front-back confusions. As the person starts to move, the sound 
source in the frontal plane is sometimes perceived in the 
opposite position on the backplane. This phenomenon is 
probably because the SMCs enacted are similar to those that 
usually used to locate sounds in the backplane. 

The de-stabilization of SMCs caused by pseudophone 
allows us to investigate different kinds of sensory activity 
involved in the auditory system in non-obvious ways. As a 
next step we plan to investigate a hypothesis drawn from a 
minimal cognition model by Izquierdo & Di Paolo, (2005) 
who showed that radical sensorimotor ambiguities (such as 
whether the sensor array is left-right inverted or not) can be 
coped with using a single sensorimotor strategy generated by 
reactive control. Participants trained in the use of the 
pseudophone under random variations of the inverted/not 
inverted modes should be expected also to converge to the use 
of a single sensorimotor strategy valid for both cases. 
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Abstract 

The non-embodied approach to teaching machines language 
is to train them on large text corpora. However, this approach 
has yielded limited results. The embodied approach, in con¬ 
trast, involves teaching machines to ground abstract symbols 
in their sensory-motor experiences, but how—or whether— 
humans achieve this remains largely unknown. We posit that 
one avenue for achieving this is to view language acquisition 
as a three-way interaction between linguistic, sensorimotor, 
and social dynamics: when an agent acts in response to a 
heard word, it is considered to have successfully grounded 
that symbol if it can predict how observers who understand 
that word will respond to the action. Here we introduce a 
methodology for testing this hypothesis: human observers is¬ 
sue arbitrary commands to simulated robots via the web, and 
provide positive or negative reinforcement in response to the 
robot’s resulting action. Then, the robots are trained to pre¬ 
dict crowd response to these action-word pairs. We show that 
robots do learn to ground at least one of these crowd-issued 
commands: an association between ‘jump’, minimization of 
tactile sensation, and positive crowd response was learned. 
The automated, open-ended, and crowd-based aspects of this 
approach suggest it can be scaled up in future to increasingly 
capable robots and more abstract language 1 . 

Introduction 

Language has been a central concern in Artificial Intelli¬ 
gence research since the field’s inception in the 1950s. Like 
many other aspects of cognition, it has been addressed with 
non-embodied and embodied approaches. Non-embodied 
approaches typically train agents on large pre-existing text 
corpora (Guha and Lenat, 1993; Collobert and Weston, 
2008), or on conversations those agents attempt to conduct 
with humans (Shawar and Atwell, 2003). However, this ap¬ 
proach has yielded limited results. 

The embodied approach to language acquisition involves 
helping agents to detect correlations between particular sub¬ 
sets of sensorimotor experiences and categories, to which 
words can be attached. However, how or whether humans 

! A video summary of the work described here is available 
at youtu.be/j3sB85ENgA8 and the source code is available at 
github.com/Janetsbe/AnetsbergerAlife2016Code. 


do this, and how best to enable robots to do this, remains an 
open question. 

The Symbol Grounding Problem 

The symbol grounding problem is a long-standing open 
problem in cognition. It concerns how we can assign mean¬ 
ing to parts of language—symbols—without succumbing to 
an infinite regress. That is, some symbols must at some point 
be grounded in something, such as categorical or iconic rep¬ 
resentations, rather than deriving meaning from other sym¬ 
bols. This is a major problem with the cognitivist approach 
to intelligence (Harnad, 1990). 

Evidence is starting to appear in the literature that sug¬ 
gests that, for humans at least, sensorimotor experience 
is the ‘soil’ in which language symbols are ultimately 
grounded. For example, Pulvermiiller and Fadiga (2010) 
describe how spoken language may be closely coupled to 
neural circuits related to motor functions. Cangelosi and 
Harnad (2001) have provided theoretical arguments for why 
sensorimotor experience provides such good grounding for 
language. First, agents perform ‘sensorimotor toil’: they ac¬ 
quire knowledge of categories (though not symbols to repre¬ 
sent them) through the costly effort of learning through ac¬ 
tion and feedback. They then perform symbolic theft: these 
categories are given symbolic representations and shared by 
those who have performed the necessary toil or who have 
themselves “stolen” from others. However, exactly how this 
can be instantiated in machines remains an open question. 

Sensorimotor Grounding 

Steels (2008) claims that a solution to the symbol grounding 
problem has been found through the creation of robots who 
can respond appropriately to human-provided commands. 
However, it is not clear how such approaches can scale up to 
more complex robots, large numbers of human tutors, and 
increasingly abstract language. In this paper we introduce 
a methodolody that may, in future work, help to support all 
three. The novelty of our approach in its present form is how 
it allows people to teach robots aspects of human language 
of the crowd’s choosing. 


684 



In other work involving non-human languages, Steels has 
demonstrated a scalable approach to language acquisition 
among robots (Steels et al., 2002). The robots participate in 
language games through which they converge on agreement 
as to the meaning of words generated by the group. Simi¬ 
larly, Schulz et al. (2012) has shown that robots can jointly 
form words for relative locations and then draw on the com¬ 
binatorial power of symbols to dictate directions to novel 
locations by combining these words in novel ways. Whether 
or how these robot-generated languages could help them un¬ 
derstand human languages however has not been addressed. 
Here, we focus on robots learning human languages. 

Crowdsourcing language acquisition. 

Despite these initial successes in teaching robots language 
(or having them teach themselves), there are many open 
questions that remain. How should robots (or how do hu¬ 
mans) ground abstract language in action? Lakoff and John¬ 
son (2008) have argued that embodied metaphors (such as 
“do not jump to conclusions”) hint that we ground even ab¬ 
stract language in sensorimotor experience, but the mecha¬ 
nisms by which this occurs are unclear. How does the ac¬ 
quisition of some symbols facilitate the acquisition of oth¬ 
ers? Do caregivers spontaneously constrain their utterances 
to scaffold language learners (Roy et al., 2009)? 

These and other questions can only be addressed by scal¬ 
able, open-ended and automated infrastructure which en¬ 
ables large numbers of people to teach large numbers of 
(possibly increasingly abstract) language to increasingly 
complex robots capable of a broadening set of sensorimo¬ 
tor experiences. Here we introduce such an experimental 
apparatus that relies on crowdsourcing. Through the web, 
observers propose arbitrary, natural language commands to 
robots and provide positive or negative reinforcement for the 
resulting actions. The robots then learn to predict which ac¬ 
tions, under which commands, are likely to generate positive 
crowd reinforcement. Various hypotheses about language 
acquisition can then be tested using this apparatus. Here, 
we first test the hypothesis as to whether robots can indeed 
ground these crowd-proposed symbols by forming theories 
of their group mind. 

Crowdsourcing has recently been exploited for training 
robots to interact with humans (Breazeal et al., 2013; Toris 
et al., 2014), but not for grounding language symbols. We 
have demonstrated that web participants can collectively de¬ 
sign and optimize robots, despite the lack of any explicit 
reward to the human participants for doing so (Wagy and 
Bongard, 2014, 2015). In other work we have shown that 
robots can be trained to form theories of mind about an indi¬ 
vidual human trainer (Hornby and Bongard, 2012), and even 
disambiguate between two trainers who disagree about how 
to reinforce the robot’s behavior (Bernatskiy et al., 2014). 
However, in (Hornby and Bongard, 2012) and Bernatskiy 
et al. (2014) the robots only learned theories of mind about 


synthetic trainers: bots who stood in for actual human train¬ 
ers. Here we demonstrate that, with the right cyberinfras¬ 
tructure, robots can successfully ground symbols provided 
by actual human caregivers. 

This research was conducted in two phases. First, subjects 
were recruited to a web interface through which they could 
issue commands to the robots and reinforce the resulting be¬ 
haviors. The data generated by this process was then used to 
train models to predict the crowd’s response to a given com¬ 
mand and set of actions. See Fig. 1 for a summary of this 
two-part methodology. 

Phase I: Crowd Deployment 

In order to allow subjects to see and interact with the 
robotics simulation, the Twitch.tv 2 video streaming web ser¬ 
vice was selected. Twitch is a popular internet service for 
observing others play video games or perform other skilled 
tasks. Twitch in turn has given rise to “Twitch Plays...” in¬ 
terfaces through which subjects can collectively observe as 
well as play interactive video games by voting on the next 
move in the game. It was hoped that the wide appeal and fa¬ 
miliarity of this interface would incentivize large-scale par¬ 
ticipation. In the work described here, we broadcast a live 
stream from a robotics simulation; subjects could then in¬ 
teract with the robots observed in the stream using live chat 
(Fig. 2; a video snippet from the deployment can be seen 
here.). 

Phase I Methods 

The crowd deployment commenced on October 29th, 2015. 
The video stream ran continuously and saw use for 22 days. 
The experiment was terminated then as user traffic had be¬ 
come negligible. During the crowd deployment, subjects 
were shown a single robotics simulation to collectively in¬ 
teract with through text input. Subjects were allowed to 
provide candidate commands and reinforcement signals to 
the simulation (see Table 1 for terminology used throughout 
this paper). Candidate commands were strings representing 
votes for what behavior a subject wanted the robot to be eval¬ 
uated against. These votes were tallied over a three minute 
interval; the most frequently-issued candidate command at 
the end of this period became the new, issued command. An 
issued command was some string dictating how the subjects 
should reinforce behaviors over the next three minute period. 
Reinforcement signals were strings indicating whether sub¬ 
jects considered the robot they were viewing to be obeying 
the issued command (‘y’) or not (‘n’). 

Robot Simulation. The simulation was developed in the 
Unity3D 3 engine, with its default graphics Tenderer, physics 
engine, and collision solver. It consisted of a scene con¬ 
taining all elements of the simulation: a floor, start loca- 

2 www.twitch.tv 

3 www.unity3d.com 
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Figure 1: Summary of the methodology. Deployment: A master program generates robot controllers, which animate one robot 
in a physics simulation after another. The video resulting from the simulation is piped in real time to www. twit ch . t v, where 
the human subjects issue the robots commands and reinforcement. Data Set: The simulations generate a data set comprised of 
the robots themselves (R), the commands issued to them (C), the controllers run on those robots under those commands (TV), 
and the reinforcement signals provided by the subjects (S'). Rerun in Engine: All robots which had been issued the command 
“jump”, and received at least one reinforcement signal, were re-simulated. Sensor Data: The resulting touch sensor data (T) 
was recorded and added to the data set, along with the normalized reinforcement signal (o). Training: The time series touch 
sensor data for a randomly-selected half of these controllers were employed to train a recurrent neural network to predict {o[- k ) 
the crowd’s actual response (o^) to each controller using CMA-ES. Testing: The ability of the trained RNN to ground the 
symbol “jump” was tested by measuring its ability to predict crowd response to the other half of the controllers. 


tion marker, a robot, a camera, and GUI elements providing 
users with instructions and feedback. Fig. 2 reports a typical 
scene during deployment. All data pertaining to the robot, 
its controller, and GUI elements were sent to the simula¬ 
tion by the master program. The simulation ran more or less 
continously for the 22 days. The simulation was continously 
recorded and sent as a live video feed to Twitch.tv. 

The simulation featured two robot types (Fig. 3). The 
simple robot was formed of three rigid bodies and two rota¬ 
tional hinge joints. Each body segment was 2.5 x .6 x 3.0 
units in size. The simple robot’s joints rotated through the 
sagittal plane, thus restricting movement to forward and 
backward motion. However, due to asymmetries in collision 
resolution, the simple robot had the ability to slowly turn, 
so constraints were imposed in the simulation to frustrate 
turning. The complex robot consisted of seven segments 
(three body segments and four legs) and six hinge joints, 
and was allowed to move about the horizontal plane. Its 
body segments and leg segments measured 1.2 x .4 x 3.0 and 
.6 x .4 x 3.0 units respectively. Both robots were equipped 
with nonfunctional eyes to provide subjects with a robot¬ 
centric frame of reference through which to issue commands 
(e.g. ‘move forward’). 

During the 22 days of deployment, the two robots alter¬ 
nated every hour: the crowd saw the simple robot for an 
hour, then the complex robot for an hour, then the simple 


robot again, and so on. During each hour period, the com¬ 
mand issued to the robot changed every three minutes. 

During each three minute period, all candidate com¬ 
mand votes were counted, recorded, and cleared. The most 
frequently-input candidate command during this period was 
issued to the robot for the next three minute period. Within 
a three minute period, six random controller were evalu¬ 
ated on the robot, each for 30 seconds. We henceforth refer 
to each of these as a ‘robot evaluation’. The total number 
of positive and negative reinforcement signals issued in re¬ 
sponse to each robot evaluation was recorded. 

A robot’s controller was defined as an N x 3 matrix with 
dimension sizes corresponding to the number of joints and 
three parameters which dictated a given joint’s amplitude 
(< a ), frequency (/3), and phase offset ( 7 ). Thus, each joint at 
each time step t was issued a desired angle a sin (f3t + 7 ), 
where a G [.6,1.5];/? G [.01, .09]; and 7 G [—107T, 10tt]. 
These constants were drawn from these ranges randomly us¬ 
ing a uniform distribution and populated the matrix for each 
controller. The ranges were selected to maximize variation 
in behavior. Thus, the subjects observed random behaviors 
(rather than evolved or learned ones) in this study. 

The robot’s color was changed whenever the controller 
was changed. The colors cycled through blue, orange, and 
violet, and then back to blue. Subjects were instructed to in¬ 
dicate the color of the robot that they were reinforcing. For 
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Figure 2: The interface, as seen by participants interacting with the simulation, a) The simulation video feed, b) A panel 
prompting participants to command what the robot should do next. At this point, one subject has proposed ‘walk forward’ as 
the next command, and two other subjects have voted for this. After a five-minute period, the most popular command takes 
effect, c) A panel prompting users to provide positive (‘y’) or negative (‘n’) reinforcement for the current action under the 
current command, which here is ‘walk forward’. Votes for either signal allow subjects to see how others are reinforcing this 
action, d) Twitch’s chat interface, through which subjects send commands, reinforcement, or other miscellaneous chatter. 



Figure 3: The “simple” (left) and “complex” (right) robot 
body types. Body segments were connected through hinge 
joints with normals as indicated by the vectors. 


example if the user wished to positively reinforce the blue 
robot, she would type ‘by’. If she wished to negatively rein¬ 
force it, she would type ‘bn’. Because video streams broad¬ 
cast through Twitch are delayed for approximately ten to 
twenty seconds with variation between viewers, this coded 
input allowed for correct assignment of reinforcement to a 
controller despite the delay. That is, subjects likely observed 
a controller (and provided reinforcement to it) after it had al¬ 
ready terminated on the host computer. 


The simulation’s GUI provided directions and feedback 
to the subjects. In the command panel, the subjects were 
told: “If you want to command the next robots, just tell them 
what to do in chat”. In the reinforcement panel, the subjects 
were told: “If you want to teach the [vjiolet [or [ojrange 
or [b]lue] robot, say ‘vy’ if yes, it’s obeying the command: 
[command]. Say ‘vn’ if no, it’s not.” 

Video Streaming and Chat Capture. Video streaming 
was done on Twitch.tv, an internet video streaming service. 
The service allows a host user to broadcast a live video 
stream to an arbitrary number of internet users who may 
(optionally) talk to each other and the host through an em¬ 
bedded IRC service. Twitch was used due to its established 
user base, chat functionality, available API, as well as the 
existence of the previously mentioned “Twitch Plays” phe¬ 
nomenon. 

Video streaming was done using FFSplit 4 to capture 
and encode video of the simulation being run on the host 
machine and stream it to Twitch, where it was viewed by 
subjects. Subjects saw this simulation and were able to in¬ 
teract with it through Twitch’s chat service. Subjects were 

4 http://www.ffsplit.com 
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Term 

Subject 

Robot type 

Candidate 

command 

Issued 

command 

Controller 


Robot 

evaluation 


Reinforce¬ 

ment 

signal 

Normalized 
reinf. sig¬ 
nal 

Touch sen¬ 
sor data 
Predictive 
model 


Definition 

Any Twitch user who provided either a re¬ 
inforcement signal or candidate command. 
Ri,i £ {0,1}, The simple ( Ro ) and com¬ 
plex ( Ri ) robots. 

A string entered by a participant. Consid¬ 
ered a vote for a future issued command. 
Cij. The jth [c]ommand the it h robot was 
evaluated against. 

Nijk- The kth controller issued to the it h 
robot under command j. A matrix of coeffi¬ 
cients used to dictate a robot’s movements. 
The 30-second simulation resulting from a 
controller issued to a robot under a given 
command. Six such evaluations were per¬ 
formed for each issued command. 

Sijku f° r l C {0,1}. The number of nega¬ 
tive (l = 0 ) and positive (1 = 1) reinforce¬ 
ments collected from subjects in response to 
the kth controller under the jth command. 
Oijk C [ 1 3 1] - 

n _ \Sjjkl |~|*S'zjfco| 

Oi o k \s ijkl \+\s ijk o\ 

Tijk- Touch sensor data generated by robot 
i under command j with controller k. 

A recurrent neural network (RNN) which 
predicts o ijk given T ijk . Its output, o' ijk , 
is an estimation of the crowd’s response to 
robot evaluation ijk. 


Table 1: Terminology used throughout this paper. 


allowed to provide any combination of characters to the chat. 
Any string of length greater than two or less than thirty that 
were not part of Twitch’s default filtered word list were con¬ 
sidered candidate commands. Subjects could provide com¬ 
mands and reinforcement at any time. 

Subjects provided reinforcement though specific coded 
input. This consisted of strings of length two which matched 
the pattern of (b\o\v)(y\n) where the first character repre¬ 
sented the color of a robot (blue, orange, or violet), and the 
second either yes or no. Further, only reinforcement signals 
corresponding to the current or immediately previous robot 
evaluation were counted. For example, if the current evalu¬ 
ation’s color was blue and the previous evaluation had been 
orange, the following strings would have been parsed as re¬ 
inforcement signals: ‘on’, ‘oy’, ‘bn’, and ‘by’. 


main post on the artificial subreddit 5 . Here, subjects 
were directed to an agreement page which sent them to the 
twitch channel 6 . 

Subjects were incentivized to interact with the system by 
adding features to the GUI which were meant to provide the 
subjects with a sense of involvement with the simulation. 
This was done by displaying subjects’ user names when 
valid input was given. This showed subjects that their input 
was being counted and used; their actions had an impact. 
For the scope and scale of this study, this seemed sufficient. 

Phase I Results 

The crowd deployment lasted 22 days. During this time, at 
least 6,388 robot evaluations were seen by hundreds of sub¬ 
jects, who provided hundreds of commands and thousands 
of reinforcement signals. Table 2 reports the relevant val¬ 
ues. 


General Figures 

Values 

Subjects 

424 

Robot evaluations sent 

57,108 

Robot evaluations observed 

> 6,388 

Subject inputs 

16,449 

Commands 

Values 

Candidate commands entered 

8943 

Candidate commands/subject 

« 21.1 

Distinct commands issued 

266 

Most frequently issued 

jump (385) 

commands 

walk forward (58) 
move forward (41) 
run (26) 

crawl forward ( 20 ). 

Reinforcement signals 

Values 

Reinforcement signals entered 

7503 

Mean reinforcement signals/eval. 

« 1.18 

Mean reinf. signals/subject 

« 17.7 

Proportion of positive reinforce¬ 
ment 

o = 0.28 


Table 2: Crowd deployment participation results. 


Participation produced a data set in which 6,388 con¬ 
trollers received at least one reinforcement signal. From this 
data set, no significant differences were found between how 
the crowd interacted with the simple and the complex robots: 
both had similar distributions in their spread and frequency 
of commands, and both received similar consistency of rein¬ 
forcement. 


Recruitment and Incentivization. Subjects were re¬ 
cruited through Reddit, a popular internet message board 
site consisting of hundreds of sub-sites (called subreddits) 
which generally limit posts to a specific topic. Posts were 
issued to relevant subreddits directing subjects towards this 


Phase II: Offline Learning 

In the second phase of this experiment, models were trained 
to ground symbols by learning relationships between com- 

5 www.reddit.com/r/artificial 

6 www.twitch.tv/janetsbe 
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mands, sensor data, and reinforcement. 

The most commonly issued command for both robots was 
“jump”: 1072 simple and 698 complex robot controllers 
were evaluated under this command. Data belonging to 
robots given this command were used to test whether the 
crowd was able to provide trainable input. Due to the on¬ 
line nature of the live deployment, some robot evaluations 
were prematurely terminated and did not remain visible to 
the crowd for their full thirty seconds. These were discarded, 
resulting in 1037 simple and 675 complex robot evaluations 
used during this phase. Since each controller directly de¬ 
termined the robot’s behavior for the duration of its run, any 
controller could be re-evaluated to obtain any additional data 
not recorded during the live deployment. 


sensors and body segments) and one output node, with 
one synapse from each input to the output, and a recurrent 
synapse on the output node. The RNN trained and tested on 
the complex robot had seven input nodes and was otherwise 
identical. A graphical representation of this RNN for the 
simple robot’s data can be seen in Fig. 1. 

During training, each controller in the training set for Ri 
had its sensor data run through the neural network such that 
the network was updated m times. Then, the output node’s 
value was read out as o', the predicted normalized reinforce¬ 
ment signal for the ith controller based on its sensor data. 
This was done for each controller in the training set. Using 
this, the error for the model was calculated using the follow¬ 
ing objective function: 


Phase II Methods 

During this phase, the goal was to determine if there exists 
features in the “jump” data set which could be used to train a 
model that predicts the crowd’s response. A recurrent neural 
network (RNN) was employed as such a model (Fig. 1). It 
was trained such that, when supplied with sensor data gen¬ 
erated by a controller, it outputs a successful prediction of 
the normalized reinforcement signal Oijk (Table 1). 

Since the chosen command was “jump”, it seemed 
likely that this command should have some relationship to 
whether, when and how the robot contacts the floor. For this 
reason touch sensor data was recorded for these controllers. 

All of the controllers evaluated under the command 
“jump” and which received at least one reinforcement signal 
in return were then re-evaluated with touch sensors added 
to the robots. If a body segment touched the ground, its 
touch sensor recorded a value of 1; otherwise a 0 was 
recorded. This was stored in matrix such that element 
tijkmn £ 0,1 indicated whether body part n contacted the 
ground at time step m for controller TV^. Each of these con¬ 
trollers was re-evaluated for the same duration as its original 
run during deployment. Since the master program gather¬ 
ing data from the Twitch channel was also controlling the 
simulation’s timing, there was some slight variation in the 
duration of each robot evaluation, which resulted in slight 
variations in the time step length. Thus, m ranged between 
3350 and 3761 and 3133 to 3719 for the simple and com¬ 
plex robots respectively. During model training and testing, 
sensor data was thus truncated tom = 3350 and m = 3133 
for the simple and complex robots respectively to make data 
consistent in shape for each robot type. 

Model training. Offline learning was conducted sepa¬ 
rately for the simple and complex robot types. For each 
of the two robots, half of their controllers were assigned 
randomly to the training set. This was repeated 100 times, 
leading to 100 models trained on 100 partially-overlapping 
training sets. The RNN associated with the simple robot 
had three input nodes (corresponding to the number of touch 


e 



iv n 


3 = 1 



/2 


( 1 ) 


Since the normalized reinforcement signals mostly con¬ 
sisted of unanimous negative reinforcement (Oi = —1), ac¬ 
counting for 77.7% and 77.8% of robot evaluations for the 
simple and complex robots respectively, this error function 
weighed two subsets of the training set equally in calculat¬ 
ing error to avoid over-fitting the model to output —1 for any 
set of sensor values. One subset consists of all robot evalua¬ 
tions with Oi = — 1 (of which there were N n ) and the other 
for all others (Oi > —l\N y ), which must by definition con¬ 
tain at least one positive reinforcement signal. In doing so, 
the error calculation weighs both groupings equally and an 
constant output of Oi = — 1 for every controller would result 
in, at best, e = 0.5. Error is, then, the sum of differences 
in each group, averaged, where o is the actual normalized 
reinforcement signal for each robot evaluation and o' is the 
model’s prediction. 

The popular, continuous-value optimization method 
CMA-ES (Hansen et al., 2003) was used to train RNNs 
against each of the 100 training sets, for both robots, result¬ 
ing in 200 runs of CMA-ES. For each run, a random initial 
solution array was used. Synaptic weights were bounded on 
[—1,1], with an initial step size of 0.1. At the termination of 
each run, the RNN with lowest error was extracted, leading 
to 100 RNNs for each robot. 


Phase II Results 

The ability of these 200 models to generalize their predic¬ 
tions to unseen controllers was measured as follows. The 
mean error of each model when exposed to its testing set 
was first computed using Eqn. 1. Then these errors for the 
100 models for each robot were in turn averaged. The re¬ 
sulting mean errors are reported as ‘experiment’ in Fig. 4. 

In order to determine the accuracy of these predictions, 
the models were also exposed to the same testing set, but 
the normalized reinforcement signals in the set were ran¬ 
domly permuted (‘permuted control’ in Fig. 4). A second 
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Figure 4: Results from the predictive model on unseen test 
data, the permuted test data set, and random RNNs on test 
data. *** denotes p < .001 as reported by a Student’s T-test. 
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control experiment was also conducted in which RNNs with 
randomly-assigned synapses also made predictions on the 
same, unpermuted test sets (‘random control’ in Fig. 4). 

Discussion 

Fig. 4 shows that both robots employed here were able to 
successfully ground at least one symbol (‘jump’) in their 
own sensorimotor experiences. Further, since this symbol 
and its meaning were provided solely by human subjects, 
this suggests that the crowd reached some implicit, mutually 
agreed upon recognition of this category of behavior in the 
robots, and that they were able to pass this category on to 
non-humanoid robots through a simple interface in a short 
time period, with no explicit reward from the investigators. 

Precisely how the robots may have grounded “jump” re¬ 
mains unknown. It may be the case that robots who tended 
to spend less time touching the ground were more likely to 
be positively reinforced by the crowd. This hypothesis is 
supported by the finding that there is a negative correlation 
between the proportion of time a robot spent with at least one 
body segment touching the ground and the crowd’s normal¬ 
ized reinforcement signal, for both robots (Fig. 5). Despite 
this, it is unclear whether the models learned this relation¬ 
ship, or instead discovered some more subtle function of the 
touch sensor data that better predicts the crowd’s response. 
In a similar manner, it is not clear if each robot grounded 
this symbol in the same way. 

Conclusions and Future Work 

Here we have demonstrated a unique methodology for 
enabling robots to gradually acquire language: crowd¬ 
generated language symbols are grounded by detecting 
correlations between sensorimotor experience and crowd¬ 
generated reinforcement. 

Given the unconstrained nature of the interface, the sub¬ 
jects could have tried to teach the robots unlearnable com¬ 
mands such as ‘xyzf’ or ’prove Fermat’s Last Theorem’. 


propGrtcen Time Grounded and Norm*i^ed Reinforcement signal 
for simple Ro&ctt Witii "jump" Command 



Proportron Time Grounded and NormaIfczed Reinforcement Signal 
For Complex Robot with "jump'’ Command 



Figure 5: Comparison of normalized reward signal and 
proportion of time grounded for the 1037 simple and 675 
complex robots with the ’’jump” command. Proportion of 
time grounded is the number of time steps where at least 
one sensor value is 1 divided by the total number of time 
steps. These values are negatively correlated withp < .001. 


Despite the fact that there was a long tail of infrequently- 
proposed commands such as these, the most frequently- 
issued commands (Table 2) were motoric as well as appro¬ 
priate for the robot’s morphology (as opposed to ‘clap your 
hands’). This suggests that the observed behavioral limita¬ 
tions of the robots may have steered the crowd toward at 
least one command that, with sufficient reinforcement, was 
learnable (‘jump’). This observation accords with Roy et al. 
(2009), who showed that three human caregivers constrained 
their utterances given the current language abilities of a hu¬ 
man child. 

Because of the recent success of deep learning ap¬ 
proaches, much work in AI has become focused on recogni¬ 
tion rather than understanding. Furthermore, recognition is 
much easier to measure than understanding: The ability of 
an algorithm to recognize human faces in an image is much 
easier to quantify than its understanding of humans. Our ap¬ 
proach is predicated on the speculation that understanding— 
even the understanding of abstract concepts—is ultimately 
grounded in sensorimotor experience. We provide a method 
for quantifying this in the domain of language: the robots 
tested here understand the word ‘jump’ in the sense that they 
have learned an association between that word, a set of ac- 
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tions generated in response to that word, and the crowd’s 
responses to those actions. 

For this paper, a small subset of the overall data set ac¬ 
quired during deployment was analyzed. However, many 
other potentially groundable symbols were provided by the 
crowd. Future work will involve instrumenting the robots 
with more sensors to attempt the grounding of more of these 
symbols, and expanding the models such that they can po¬ 
tentially ground more than one symbol, or detect semantic 
similarities between grounded symbols. Further, since we 
have data corresponding to different morphologies, we may 
be able to discover if morphology impacts the way a robot 
grounds symbols. In subsequent deployments of the sys¬ 
tem we also plan to enable robots to ground symbols in real 
time rather than retroactively. Also, we wish to exploit the 
fact that the models form a theory of group mind about the 
crowd to minimize user fatigue: one robot should be able to 
predict how the crowd will react to another robot, even be¬ 
fore the latter robot is shown to them. In this way, not every 
robot would need to be reinforced by the crowd. 

Finally, we wish to investigate how increasingly complex 
robot morphologies, task environments, and behaviors influ¬ 
ence the crowd’s behavior such that they incrementally train 
the robots to understand increasingly abstract language. 
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Abstract 

Due to the replacement of natural flora and fauna with ur¬ 
ban environments, a significant part of the earth’s organisms 
that function as primary consumers have been dispelled. To 
compensate for the reduction in the amount of primary con¬ 
sumers, robotic systems that mimic plant-like organisms are 
interesting to mimic for their potential functional and aes¬ 
thetic value in urban environments. To investigate how to 
utilize plant developmental strategies in order to engender ur¬ 
ban artificial plants, we built a simple evolutionary model that 
applies an L-System based grammar as an abstraction of plant 
development. In the presented experiments, phytomorpholo¬ 
gies (plant morphologies) are iteratively constructed using a 
context sensitive L-System. The genomic representation of 
the L-System is subject to mutation by an evolutionary al¬ 
gorithm. These mutations thus alter the developmental rules 
of these phytomorphologies. We compare the differences be¬ 
tween the light absorption of evolving virtual plants that re¬ 
main static during their life and virtual plants that possess 
the possibility to move joints that link the separate parts of 
the virtual plants. Our results show that our evolutionary al¬ 
gorithm did not exploit potential beneficial joint actuation, 
instead, mostly static structures evolved. The results of our 
evolving L-System show that it is able to create various phy¬ 
tomorphologies, albeit that the results are preliminary and 
will be more thoroughly investigated in the future. 

Introduction 

The development of phytomorphological elements of plants 
ultimately arose from a dynamic interaction between ge¬ 
netic, ontogenetic and environmental forces. Phytomorpho¬ 
logical traits have emerged through the evolution and selec¬ 
tion of plants, favoring those that were adequately adapted 
to their environment. Different environments stimulate the 
development and evolution of specific qualities in plants and 
contribute to the adaptation of plants to specific environmen¬ 
tal niches. Light-absorption is one of the most essential char¬ 
acteristic prevalent in almost all plants. The resulting role of 
plants as primary consumers conveys their fundamental im¬ 
pact on any terrestrial ecosystem. Urban environments have 
replaced a large share of plant-rich environments meaning 
that the potential energy up-take in these environments is 
exposed and primed for solar exploitation. For an efficient, 


but still aesthetically pleasing, deployment of solar cells, 
we investigate the developmental processes manifested by 
years of plant evolution. Hence, we are interested in gain¬ 
ing insights into how plant development works and how this 
can be mimicked in intelligent robotic and autonomous sys¬ 
tems. In order to investigate how to properly embody such 
systems, an evolutionary developmental simulation model 
was created for investigating various factors that have con¬ 
tributed to the evolution of phytomorphologies. In the con¬ 
text of flora-robotica, a 4 year project funded under the ’EU- 
Horizon 2020 Future and Emerging Technologies Proactive 
Action’, the developmental methodology for creating artifi¬ 
cial plants and eventually robotic and autonomous systems, 
is developed to investigate how these systems may emerge 
from the simulated evolution of developmental systems of 
virtual plants. 

Various signaling mechanisms have evolved to communi¬ 
cate environmental factors to remote cells and tissues. More¬ 
over, the cell walls of plant cells contribute to the relative 
immobility as well as the rigidity of plants, limiting cell mi¬ 
gration and actuation. Lacking a nervous system, plants 
are forced to utilize signaling molecules for communica¬ 
tion. These molecules atone for the lack in efficient com¬ 
munication mechanisms through various diffusion and trans¬ 
duction pathways. The signaling molecules can be trans¬ 
ported through an apoplastic (through the cell wall) or sym- 
plastic (via the cytoplasm; through plasmodesmata) path¬ 
way. Various molecules can also be transported over long 
distances through the vasculature of the plant. Although 
plants acquired efficient dynamic behavior that directly in¬ 
fluences morphogenesis, we are interested in seeing whether 
phytomorphologies can emerge from simpler abstractions. 
Since actual robotic implementations of evolved phytomor¬ 
phologies are likely not able to grow or move once cre¬ 
ated, grammars seem to be a suitable method to implement. 
Conventionally, development through local cell communica¬ 
tion (or tissue communication) can be simulated by simple 
grammars (Lindenmayer and Jiirgensen, 1992) while more 
complex communication can be mimicked by implementing 
mophogens (Wolpert, 1969). Morphogens seem to be more 
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Figure 1: Eight of the best evolved static individuals with 
their respective fitness values. 

relevant abstractions taken from biology than L-Systems 
though L-Systems are easier to implement. Moreover, since 
L-Systems work with variables, they can potentially be ex¬ 
tended to contain signal propagation algorithms and even 
morphogens themselves. In this paper, L-Systems are im¬ 
plemented to engender the phytomorphogenesis of artificial 
plants with the aim of evaluating the possible generation of 
physical systems that act as primary energy consumers in 
urban environments. 

Background 

Phytomorphogenesis 

Variation in plant features is influenced by many fac¬ 
tors including ecophysiological, phenological, morpholog¬ 
ical and ontological traits. Other important factors driv¬ 
ing plant-evolution include resource allocation, biochem¬ 
istry, metabolism, and leaf morphology and function (Ack- 
erly et al., 2000). All the genes are in turn subject to evo¬ 
lution and specific genetic components are selected across 
generations. The absolute fitness value of a plant is roughly 
determined by the amount of viable seeds it produces during 
its lifetime that is promoted by the previously mentioned at¬ 


tributes. 

It has been shown that the photo synthetic rate of leaves 
in plants has a direct influence on the absolute fitness of 
in arabidopsis thaliana. One specific gene (Altlg61800) 
causes leaves to produce more chloroplasts when plants 
were placed in a different environment where they were 
subjected to a higher light-intensity (Athanasiou et al., 
2010) demonstrating the importance of dynamic feedback 
for plants. However, solar cells are less affected by environ¬ 
mental factors such as temperature and do not necessarily 
have to rely on complex feedback processes to function op¬ 
timally. Since dynamic behavior in plants is usually a result 
of various compromises taken to optimize for survival and 
reproduction, we conceive that plants grown in controlled 
conditions do not have to rely on dynamic feedback as much. 
We therefore focus on investigating the more intrinsic prop¬ 
erties of plants that contribute to the static generation of phy¬ 
tomorphologies within a degree of stochasticity. 

Phyllotaxis is the main factor driving phytomorphogen¬ 
esis (Cells, 1997). The most common patterns formed in 
plants through phyllotaxis include distichous, spiral, decus¬ 
sate and whorled patterns (Kuhlemeier, 2007). Notably, the 
divergence angles of primordia of the plants differ usually 
by 180°, 90°, 137.5°(Newell and Shipman, 2005) and some 
other more uncommon angles (Kuhlemeier, 2007). These, 
mostly unimodal, angles influence how well the leaves 
sprouting from the primordia can absorb light and over¬ 
shadow other leaves (Falster and Westoby, 2003). Leaves 
can also be positioned at a certain level of steepness which 
is advantageous for either preventing self-shading or cap¬ 
turing light from the morning and evening sun (Falster and 
Westoby, 2003). Steeper angles of lamina are also more 
beneficial for plants that receive an amount of light higher 
than the maximum photo synthetic potential of a plant. When 
the leaves are steeply oriented, other leaves, that would oth¬ 
erwise be overshadowed, can receive more light and thus 
the overall photosynthetic activity of the plant is increased. 
Other evolutionary trade-offs that emerge in the leaves of 
plants include e.g. mass-to-area ratio, sap flow versus heat 
processing, C02 uptake to water loss ratio and the leaf size- 
to-number ratio (Nicotra et al., 2011). Moreover, hormones, 
such as auxin, play an important role in embryonic de¬ 
velopment, cellular elongation and phyllotaxis (Prasad and 
Dhonukshe, 2013). Despite the importance of these driving 
factors for the development of plants, these factors would 
greatly convolute the evolutionary search space. 

Simulated models 

Computer models of plants have generally been imple¬ 
mented in computer graphics (Habel et al., 2009), for 
accurate modeling of plant dynamics (Runions et al., 
2014; Cournede et al., 2008; Merks and Guravage, 2013; 
Prusinkiewicz and Runions, 2012) and for assessing the role 
of evolution on the emergence of plant traits (Valladares and 
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Pearcy, 2000). Moreover, evolutionary computations and 
generative encodings have been implemented to efficiently 
simulate plant models (Zamuda and Brest, 2012, 2014) with 
some biological accuracy. In previous work on generating 
patterned morphologies, and for keeping the morphological 
encoding simple, generative encoding strategies, such as the 
parametric encoding used in the work of Sims (Sims, 1994), 
are usually implemented since they can recursively gener¬ 
ate body segments. Different types of generative encoding 
strategies have been developed over the past two decades 
to abstract developmental strategies towards generating both 
morphology and control of virtual creatures (Eggenberger- 
Hotz, 1997; Yeom and Park, 2010). One strategy for gener¬ 
ating artificial structures linked to neural networks is known 
as artificial ontogeny (Bongard and Pfeifer, 2001, 2003). 
In this method, an agents simulated spherical elements can 
grow by increasing in size and splitting in two. As a re¬ 
sult, repeated divisions can transit a single unit in a fully 
developed agent. Each separately created unit contained up 
to six joints and diffusion sites. These diffusion sites could 
in turn contain zero or more sensory, motor and intemeu- 
rons. Despite a promising application of artificial ontogeny 
to produce plant-like structures, the implementation of neu¬ 
ral networks can result in a great increase of the search-space 
making it a less attractive system to implement for our cur¬ 
rent purposes. 

A Lindenmayer system (L-system) is another gram¬ 
matical generative encoding approach, originally used to 
mimic plant development by iteratively rewriting variables 
and constants through a set of rules (Lindenmayer, 1968; 
Prusinkiewicz, 1997). L-Systems can be seen as a develop¬ 
mental representation of a virtual plant, comparable to other 
generative encoding strategies, the similarity of L-Systems 
to Biology includes their modularity of the reuse of rules 
and variables comparable to how organisms reuse genes. 
Further relevance L-systems have to biology can be derived 
from the fact that cells, or parts of plants, can change their 
state, or cell fate. This determines the behavior and ulti¬ 
mately the phytomorphogenesis of plant form and structure. 
L-Systems are thus an attractive method to implement for 
our purposes both as they somewhat mimic biological de¬ 
velopment as well as being simple and efficiently encoded. 
L-systems have furthermore been used to create the morpho¬ 
logical structure of virtual creatures with reactive controllers 
(Hornby and Pollack, 2001). This approach can similarly be 
effective for the generation of virtual plants. 

Methodology 

Virtual Robot Experimentation Platform (V-REP) (Rohmer 
et al., 2013) is used as the simulator to create and evaluate 
plant-like robotic morphologies. The simulated components 
are controlled via a C++ based DLL plugin created with vi¬ 
sual studio 2013. The plugin is divided into three parts: a ge¬ 
netic algorithm, a morphology generator and a control part. 


The genome of the morphology is encoded as the rules and 
parameters of the L-Systems. Two experiments were done 
to simulate 16 evolutionary runs for evolving static plant¬ 
like morphologies as well as 16 runs for evolving plant-like 
morphologies in which joints could rotate. 

Genetic Algorithm 

The implemented genetic algorithm is a steady state genetic 
algorithm (Wu and Chow, 1995). And in our case, a ran¬ 
dom offspring is generated asexually, without crossover, and 
evaluated against a random individual in the population. The 
random selection and a population size of 100 individuals 
was used to keep the population somewhat diverse and to 
slower the convergence of the evolving L-System to a local 
optima. The genomes of the initial population were further¬ 
more randomly initialized. The individuals were evaluated 
based on their ability to absorb light in an environment that 
only contained a flat surface, a light-source and the individ¬ 
ual itself. When comparing an evaluated offspring with a 
random individual of the population, the offspring would 
only replace the selected individual if its fitness value was 
higher. Based on preliminary experiments, the mutation rate 
was set to 5% meaning that each variable of the genome 
had a 5% chance of being changed. When mutating the 
variables, either a completely new random value could be 
assigned to the specific variable, or a local mutation could 
cause the value to change locally. These local mutations are 
most effective to explore the local search-space of a popula¬ 
tion of individuals. 

Ten evaluation steps contribute to the eventual fitness 
value of a virtual plant. At each time-step, the amount of 
light absorbed by the simulated leaves of an individual is 
calculated. The orientation and surface area of the leaves 
have a direct influence on the amount of light absorbed by 
the leaves. The amount of light absorbed is calculated by the 
multiplication of one light-sensitive surface area of the arti¬ 
ficial leaf with the z-directional vector of the leaf relative to 
the directional vector that is oriented from the leaf’s origin 
to the origin of the light-source. Furthermore, if there is any¬ 
thing between the artificial leaf and the light-source, the leaf 
will does not contribute to the fitness value of the individ¬ 
ual. The light-source that directly influences the fitness of 
the virtual plants is moved at each time-step. Starting at the 
Cartesian coordinate (2.0,-4.0,10.0) and ending at the coor¬ 
dinate (2.0,5.0,10.0). The sun thus moves in the direction of 
y with a directional vector of (0.0, 1.0, 0.0) as illustrated in 
figure 3. 

The fitness function for each individual is given in equa¬ 
tion 1. The fitness F is the sum of the acquired fitness values 
after ten time-steps i. n represents the total amount of eval¬ 
uation time-steps. The total amount of leaves is given by o, 
and p represents the total amount of objects formed by the 
individual being evaluated. A represents the surface area of 
the artificial leaves which is multiplied by the z directional 
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Figure 2: Three illustrations of the implemented L-System are shown. The Genotypic representation shows how the production 
rules result in the generation of the morphology. The symbolic representation shows the developmental instructions and the 
relationships between states as similarly represented by the work of Sims (Sims, 1994). The phenotype generated by the 
example is shown on the right. Note that the + constant represents a three dimensional orientation to which a new object is 
rotated relative to its parent. 


vector 0. V represents the volume of the objects. 

no p 

F = E(Ev^-E^) (d 

i=l j=1 k=l 

L-System 

The implemented L-System was a context sensitive L- 
system. In our case, the context refers mainly to the simu¬ 
lated environment. For example, in order to prevent objects 
from overlapping, a feedback loop to the L-System ensures 
that the created morphology does not contain any overlap¬ 
ping/colliding objects. The L-system contains a total of 10 
variables which are referred to as specific states of the ob¬ 
jects that are created. Each state of the object contains corre¬ 
sponding rule sets that define what child objects are created. 
An example of how the states, rules and constants of the L- 
System influence morphogenesis is displayed in figure 2. 

The L-System generates morphologies by iterating seven 
times through the state parameters of the morpholgy. Seven 
iterations were subjectively chosen as they seemed to ex¬ 
hibit a good diversity of morphologies without requiring too 
much computational power. The axiom of the L-System is a 
state 0 object. Before the first iteration of the L-System, an 
object in state 0 is therefore created at the center of the en¬ 
vironment on top of the floor. Afterwards, the first iteration 
of the L-system will generate objects that the rules in state 
0 produce. Having only seven iterations, an object chain 
from the initial object to the outer most child consists of a 


maximum of 8 objects. Some loopholes in the L-System 
can quickly result in a very high computational demand and 
thus specific constraints are implemented. Every object in 
a given state can potentially create up to six new child ob¬ 
jects. The maximum amount of objects that can be created 
is therefore limited to 50. Likewise, the amount of loops the 
L-System can make for generating these objects is limited to 
200. To enable individuals to absorb light from the environ¬ 
ment, two object states of the L-System genome represent 
artificial leaves that are expressed as rectangular cuboids. 
These leaves are colored orange. All other states represent 
spherical objects that shape the overall morphology. Spheri¬ 
cal objects were chosen in order to effortlessly calculate the 
position of new objects without having to worry about col¬ 
lisions and overlapping objects. The objects in four other 
object states are colored red, blue, green and yellow while 
the remaining objects are colored black by default. Note that 
the first object created is always in state zero which is always 
colored red. An illustration of how the L-System generates 
the phenotype from a specific genome is depicted in figure 
2 . 

Additional parameters are included in the L-System to en¬ 
able movement of the joints. Whether a joint moves is rep¬ 
resented by one Boolean. The angular rotation a joint can 
make per time-step is limited to 36 degrees meaning that a 
joint can rotate a maximum of 360 degrees in a positive or 
negative direction during one evaluation. 
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Figure 3: This figure shows the top-down view of the sim¬ 
ulation environment with an omnidirectional light-source 
shown as a white dot in the bottom left corner. The dashed 
blue line represents the movement of the light-source. 


Results 

As can be seen in figure 5, the average acquired fitness val¬ 
ues of the population with static virtual plants is similar to 
the fitness of the population of plants that could potentially 
actuate their joints. Since the evolutionary runs were not 
normally distributed (confirmed by a Shapiro Wilk test) a 
Mann-Whitney U Test was performed to see whether the re¬ 
sults were significantly different. The Mann-Whitney U test 
confirms that the data is insufficient to reject its null hypoth¬ 
esis as can also be inferred by looking at the graph (figure 
5). No statistical difference between the efficiency of static 
versus actuated phenotypes could be seen for the amount 
of simulated generations. A difference might emerge when 
simulating far more generations considering that the runs 
shown in figure 5 did not plateau. Although a few pheno¬ 
types did utilize moving parts (such as in figure 7), the ma¬ 
jority of the phenotypes that evolved did not move. In the 16 
evolutionary runs of rotating individuals, the best individ¬ 
uals of the final generations seldom utilized any actuation 
in joints that would change the shape of the artificial plants 
significantly. 

Although the fitness values depicted in the graph of fig¬ 
ure 5 seem quite arbitrary, they can be explained with some 
additional information. For example, the fitness value of the 
best evolved individual (figure 6) was 23.841. Without the 
negative contribution of the volume of the individual, its fit¬ 
ness would have been 31.939. The division of this value 
by the amount of time steps results in the average surface 
area of the artificial leaves that was exposed to the light- 
source. This area is corrected by the relative angle the leaves 
had in respect to the light-source. 3.194 m 2 is thus the 2D- 
projection of the average light absorbing surface area of the 



Figure 4: A top view of individuals of one evolutionary run 
are depicted to illustrate how evolution shapes new more ef¬ 
ficient individuals. 


artificial leaves. The total volume of an individual could 
also be extracted by checking the negative contribution of 
the volume. In the given example, the total negative fit¬ 
ness contribution of the volume of the individual discussed 
in this paragraph was 8.099. The total volume of the simu¬ 
lated individual was thus 0.8099 m 3 . Hence, the phenotype 
seen in figure 6 represents a structure with an average light- 
absorption area of 3.19399 m 2 and a volume of 0.80989 m 3 . 

The phenotypes of the evolved phytomorphologies are 
quite diverse and different spiral patterned morphologies can 
be seen (figure 1). In figure 4, the best evolutionary run 
is mapped across different generations. Looking at the top 
view of this figure, one can see that the total amount of sur¬ 
face area exposed by the artificial leaves (orange rectangles) 
gradually becomes larger. 

Discussion 

In this paper, we aimed to see how an evolutionary devel¬ 
opmental algorithm can engender various phytomorpholo¬ 
gies optimized to absorb light. As can be seen in figure 
1, a wide variety of phytomorphologies evolved. Function¬ 
ally, these evolved morphologies don’t look particularly op¬ 
timal for light absorption as one would expect all the orange 
surfaces to point somewhat upwards instead of in the vari¬ 
ous directions shown in the resulting morphologies. Making 
longer evolutionary runs could shed more light on whether 
the evolutionary L-System can actually generate more effi¬ 
cient models. Actuating the morhologies did not change the 
population fitness values significantly when compared to the 
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Figure 5: The figure depicts the average fitness values of 
the populations across generations. The runs are not signifi¬ 
cantly different from one another (p-value was 0.782) when 
using the Mann-Whitney U test). 


additional restriction to grow horizontally. In biological en¬ 
vironments, factors such as the overshadowing of neighbor¬ 
ing plants cause additional pressure that stimulate specific 
types of plants to grow tall quickly. Co-evolving the same 
L-System can therefore yield results that are more diverse 
than the ones presented in this paper. 

Considering the results, various future improvements of 
the genetic algorithm may increase the efficiency of a popu¬ 
lation to traverse the search space. Implementing a crossover 
function might definitely increase the efficiency of the evolv¬ 
ing L-System considering that specific states and rules of 
the L-System can be recombined between individuals within 
the population to make better performing offspring. As 
mentioned earlier, the implementation of neural networks 
in addition to artificial development (as done by Bongard 
(Bongard and Pfeifer, 2003)) can be interesting for devel¬ 
oping more dynamic morphologies. Morphogens (Wolpert, 
1969) are also an attractive strategy to implement in order 
to mimic long range communication in plants. An algo¬ 
rithm that checks for diversity besides quality, as has been 
implemented in novelty search (Lehman and Stanley, 2008) 
might also be useful to speed up the search process. More¬ 
over, novelty search can lead to the evolution of very distinct 
morphologies making it more useful for people that would 
like to generate phytomorphological structures for aesthetic 
purposes. 


statically simulated populations. Blind tracking of a moving 
light-source may have caused the search space to become 
more convoluted making the algorithm inept for finding so¬ 
lutions where actuation was more beneficial than not actuat¬ 
ing anything. 

The evolved virtual plants were quite voluminous consid¬ 
ering that the volume has a negative effect on the fitness 
value. However, making large objects and dispersing the 
morphology over a large area, while making leaves with 
a thin volume but large surface area, is an intuitive result 
given the simulation environment. It is expected that differ¬ 
ent phytomorphologies arise when artificial plants have an 



Figure 6: The phenotype of the best evolved individual. 
Note that object chains are surrounded by artificial leaves 


Conclusion 

We have shown that our evolving L-System can create vari¬ 
ous phytomorphologies that are evolved to maximize light 
absorption. These phytomorphologies were generated to 
consider implementing them in urban environments for both 
functional and aesthetic motives. Evolution did not exploit 
possibly beneficial joint actuation but rather converged on 
various types of static phytomotphologies instead. In the 
future, this evolving L-System can be extended by imple¬ 
menting additional algorithms to increase the effectiveness 
of traversing the state space landscape for acquiring both 
more efficient and more unique phytomorphologies. 

Acknowledgment 

Project ’flora robotica’ has received funding from the Euro¬ 
pean Unions Horizon 2020 research and innovation program 
under the FET grant agreement, no. 640959. 

References 

Ackerly, D. D., Dudley, S. A., Sultan, S. E., Schmitt, J., 
Colemanc, J. S., Linder, R., Sandquist, D. R., Geber, 
M. A., Evans, A. S., Dawson, T. E., and Lechowicz, 
M. J. (2000). The evolution of plant ecophysiologi- 
cal traits: Recent advances and future directions. Bio- 
Science , 50(11). 


697 










Figure 7: An individual that rotated some of its joints during the simulation. TO, T2, T4 and T6 represent the respective time 
steps 0, 2, 4 and 6 


Athanasiou, K., Dyson, B. C., Webster, R. E., and Johnson, 
G. N. (2010). Dynamic acclimation of photosynthesis 
increases plant fitness in changing environments. Plant 
Physiology , 152(1):366—373. 

Bongard, J. C. and Pfeifer, R. (2001). Repeated structure 
and dissociation of genotypic and phenotypic complex¬ 
ity in artificial ontogeny. Proceedings of the Genetic 
and Evolutionary Computation Conference (GECCO- 
2001), (1998):829—836. 

Bongard, J. C. and Pfeifer, R. (2003). Evolving complete 
agents using artificial ontogeny. Morpho-functional 
Machines: The New Species (Designing Embodied In¬ 
telligence), pages 237-258. 

Cells, K. (1997). Phyllotaxis. In The Algorithmic beauty of 
plants, chapter 4, pages 63-123. 

Cournede, P. H., Mathieu, A., Houllier, F., Barthelemy, D., 
and De Reffye, P. (2008). Computing competition for 
light in the greenlab model of plant growth: A contri¬ 
bution to the study of the effects of density on resource 
acquisition and architectural development. Annals of 
Botany, 101(8): 1207-1219. 

Eggenberger-Hotz, P. (1997). Evolving morphologies of 
simulated 3d organisms based on differential gene ex¬ 
pression. Proceedings of the 4th European Conference 
on Artificial Life (ECAL97), pages 205-213. 

Falster, D. S. and Westoby, M. (2003). Leaf size and an¬ 
gle vary widely across species: What consequences for 
light interception? New Phytologist, 158(3):509-525. 

Habel, R., Kusternig, A., and Wimmer, M. (2009). Phys¬ 
ically guided animation of trees. Computer Graphics 
Forum, 28(2):523-532. 

Hornby, G. S. and Pollack, J. B. (2001). Evolving 1-systems 
to generate virtual creatures. Computers and Graphics 
(Pergamon), 25(6): 1041-1048. 


Kuhlemeier, C. (2007). Phyllotaxis. Trends in Plant Science, 
12(4): 143-150. 

Lehman, J. and Stanley, K. O. (2008). Exploiting open- 
endedness to solve problems through the search for 
novelty. Artificial Life XI, pages 329-336. 

Lindenmayer, a. (1968). Mathematical models for cellular 
interactions in development, i. filaments with one-sided 
inputs. Journal of theoretical biology, 18(3):280-299. 

Lindenmayer, A. and Jiirgensen, H. (1992). Grammars of 
development: Discrete-state models for growth, differ¬ 
entiation, and gene expression in modular organisms. 
In Rozenberg, G. and Salomaa, A., editors, Linden¬ 
mayer Systems: Impacts on Theoretical Computer Sci¬ 
ence, Computer Graphics, and Developmental Biology, 
chapter 1, pages 3-21. Springer Berlin Heidelberg. 

Merks, R. M. H. and Guravage, M. A. (2013). Building sim¬ 
ulation models of developing plant organs using vir- 
tualleaf. In Plant Organogenesis, volume 959, pages 
333-52. 

Newell, A. C. and Shipman, P. D. (2005). Plants 
and fibonacci. Journal of Statistical Physics, 
121 (December):937-968. 

Nicotra, A. B., Leigh, A., Boyce, K., Jones, C. S., Niklas, 
K. J., Royer, D. L., and Tsukaya, H. (2011). The 
evolution and functional significance of leaf shape in 
the angiosperms. Functional Plant Biology, 38(Gates 
1980):535-552. 

Prasad, K. and Dhonukshe, P. (2013). Polar auxin transport. 
17:25-45. 

Prusinkiewicz, P. and Runions, A. (2012). Computational 
models of plant development and form, pages 549-569. 

Prusinkiewicz, P. A. L. (1997). The algorithmic beauty of 
plants. 


698 



Rohmer, E., Singh, S. R N., and Freese, M. (2013). V-rep: 
A versatile and scalable robot simulation framework. 
IEEE International Conference on Intelligent Robots 
and Systems , pages 1321-1326. 

Runions, A., Smith, R. S., and Prusinkiewicz, P. (2014). 
Computational models of auxin-driven development, 
pages 1-48. 

Sims, K. (1994). Evolving virtual creatures. Siggraph ’94, 
SIGGRAPH ’(July): 15-22. 

Valladares, F. and Pearcy, R. W. (2000). The role of crown 
architecture for light harvesting and carbon gain in ex¬ 
treme light environments assessed with a realistic 3-d 
model. Anales del Jardin Botanico de Madrid , 58(1): 3- 
16. 

Wolpert, L. (1969). Positional information and the spatial 
pattern of cellular differentiation. Journal of theoretical 
biology, 25(1): 1-47. 

Wu, S. J. and Chow, P. T. (1995). Steady-state genetic algo¬ 
rithms for discrete optimization of trusses. Computers 
and Structures, 56(6):979-991. 

Yeom, K. and Park, J. H. (2010). Artificial morphogene¬ 
sis for arbitrary shape generation of swarms of multi 
agents. Proceedings 2010 IEEE 5th International Con¬ 
ference on Bio-Inspired Computing: Theories and Ap¬ 
plications, BIC-TA 2010, pages 509-513. 

Zamuda, A. and Brest, J. (2012). Tree model reconstruc¬ 
tion innovization using multi-objective differential evo¬ 
lution. In 2012 IEEE Congress on Evolutionary Com¬ 
putation, pages 1-8. 

Zamuda, a. and Brest, J. (2014). Vectorized procedural mod¬ 
els for animated trees reconstruction using differential 
evolution. Information Sciences, 278:1-21. 


699 



700 



Theory and Measures 



Nonequilibrium thermodynamic stability: the apparent teleology of living beings 

Mario Villalobos 


Universidad de Tarapaca 

Escuela de Psicologia y Filosofia, 18 de Septiembre 2222, Arica, Chile 
Instituto de Filosofia y Ciencias de la Complejidad 
Fos Alerces 3024, Santiago, Chile 
mario.kirmayr@gmail.com 


Abstract 

Among physical systems, living beings are usually thought of as 
the only genuinely teleological natural systems. In this paper, I 
argue that the alleged teleology of living beings is not a real 
property but only an appearance, behind which what really 
exists is a complex version of stability. The complexity of living 
beings as stable systems has to do mainly, though not 
exclusively, so I argue, with the fact that living beings are 
dissipative structures which obey the thermodynamic principle 
of maximum entropy production. 

Living beings: stability or teleology? 

Fiving beings remain alive to the extent that a set of metabolic 
or physiological variables (usually called critical or essential 
variables) maintain their values within certain specific ranges 
(called physiological or metabolic ranges) (Ashby, 1960). 
Fiving beings’ ability to maintain, in spite of disturbances, their 
physiological or metabolic condition within these ranges is 
what is usually known as homeostasis. Fiving beings’ 
homeostasis is a particular version of stability, which is a 
relatively common property among physical systems. 

All stable systems, when disturbed, generate to greater or 
lesser degree a characteristic behavioral pattern that appears to 
be teleological (Ashby, 1960). They exhibit a typically 
convergent behavioral pattern; i.e., no matter which way they 
are displaced from their steady state (the variability of the 
disturbances), they always return to the same steady state. A 
pendulum, for example, regardless of the angle of the 
displacement, will always return to the same state of 
equilibrium (the resting position). It is the combination of 
variability (by the side of the behavior) with invariance (by the 
side of the steady state), that gives the idea of ‘flexibility’ in 
the system. The system, somehow, seems to have a fixed goal 
around which it is able to vary and ‘accommodate’ its behavior 
according to the different circumstances. However, the system, 
e.g., the pendulum, does not really move according to goals or 
purposes; it just follows physical laws. 

Fiving beings, as highly complex stable systems, are not the 
exception to this rule but rather the most representative and 
strongest case (Villalobos, 2015). When dealing with 
pendulums, most of us do not find a teleological explanation 
terribly attractive, as simple physical variables are enough to 
explain their behavior. When dealing with living systems, 


however, the situation seems to change. Why is this so? Why 
are we so prone to attribute some kind of teleology to living 
beings? 

One might say that we humans simply tend to project 
features of our subjective experience to entities which are close 
to our genus, and that living beings, without any doubt, are 
closer to us than pendulums. But that comment, even if true, 
does not explain the apparent teleology of living beings as a 
function of living beings themselves ; it just expresses, at most, 
a human bias. What I want to do here, instead, is to explain the 
teleological appearance of living beings taking as explanans 
the very constitution and functioning of living beings 
themselves. The question is “What is peculiar about living 
beings, i.e., their constitution and functioning, such that their 
behavior appears to be teleological?” The answer, I argue, has 
to do with the complexity of living beings as stable systems. 

The relative simplicity or complexity of a stable system has 
to do, mainly, though not exclusively, with the following 
factors: a) its dimensionality (the number of variables in which 
the system exhibits stability), b) its thermodynamic regime, c) 
the presence or absence of feedback mechanisms, and d) its 
order of stability (e.g., first-order stability, second-order 
stability). Fiving beings have high dimensionality, exist in far- 
from-thermodynamic equilibrium conditions (i.e., they are 
dissipative structures), have feedback mechanisms, and (at 
least in the case of animals) exhibit second-order stability (i.e., 
they are ultrastable systems). All these factors, I argue, enrich 
or complicate the way in which living beings generate their 
behavior as stable systems, but do not introduce any 
ontological exceptionality in terms of teleology. That is, 
although much more complex, living beings remain as 
purposeless as pendulums. 

In previous works, I have addressed all the aforementioned 
factors in some detail (Villalobos 2015; Abramova and 
Villalobos, 2015). Here I will focus only on the 
thermodynamic nature of living beings. The thermodynamic 
nature of living beings and its connection with teleology has 
been addressed by ecological theorists of perception, especially 
in the line of what they call “physical intelligence” (Turvey and 
Carello, 2012; Kondepudi, 2012; Shaw and Kinsella-Shaw, 
2012). Typically, these theorists see thermodynamics as a 
scientific ground to naturalize teleology. 

My interpretation takes a different path. I argue that 
thermodynamics, instead of giving us a scientific base to 
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naturalize teleology, provides us with good reasons to 
eliminate it from biology. 

Nonequilibrium thermodynamic stability 

From a thermodynamic point of view, living beings belong to a 
special group of physicochemical systems called dissipative 
structures (Prigogine and Stengers, 1984). Examples of these 
structures include Benard cells, flames, hurricanes, and 
whirlpools (Ji, 2012; Ulanowicz and Hannon, 1987). The 
peculiarity of these systems, as opposed to the so called 
equilibrium structures (or near-equilibrium structures), is that 
they are constituted in far-from-thermodynamic equilibrium 
conditions, and maintain integrity through the constant 
exchange of energy (and matter in the case of open systems) 
with the environment. In other words, they disintegrate if this 
exchange is cut off. Living beings, like any other dissipative 
structure, are systems whose region of physicochemical 
stability is far-from-thermodynamic equilibrium. This means 
that, when disturbed, they move not to equilibrium but to the 
specific far-from-equilibrium region in which they conserve 
integrity. 

Every dissipative structure, at different scales, exhibits the 
same behavioral pattern of stability. If we disturb a candle 
flame in different ways (without being destructive, of course), 
we see how the flame reconstitutes as such. The same occurs 
with a maelstrom in the sea; it recovers its dynamics and 
conserves its integrity. Once a nonequilibrium steady state 
stabilizes as such, it is able to exhibit a considerable degree of 
stability (Kondepudi, 2012). Sure, the stability that a physical 
volume X can reach in far-from-equilibrium conditions is 
weaker than the stability that it may reach in equilibrium 
conditions (sooner or later, hurricanes disintegrate and living 
beings die), yet it is still a quite robust stability. As in any case 
of stability, dissipative structures seem to ‘insist,’ despite 
disturbances, in retaining their organization, and so are 
susceptible to teleological descriptions. But why do dissipative 
structures exhibit stability? 

According to a relatively established hypothesis in 
thermodynamics, dissipative structures, living or not, originate, 
exist, behave and evolve following what is known as the 
‘maximum entropy production principle’ (MEPP) (Kondepudi, 
2012; Martyushev and Seleznev, 2006; Michaelian, 2011; 
Swenson, 2009; Swenson and Turvey, 1991). MEPP, roughly, 
states that given a thermodynamic gradient through a system, 
structured subsystems tend to organize and behave so as to 
maximize the production of entropy (Martyushev and 
Seleznev, 2006; Swenson, 2009). This phenomenon, in 
England’s view (2015), can be interpreted as an instance of 
‘dissipative adaptation,’ and understood as a general condition 
of nonequilibrium spontaneously organized systems. 
According to this hypothesis, dissipative structures, living or 
not, would exhibit stability as a result of MEPP (Kondepudi, 
2012 ). 

Although this hypothesis is still in need of more substantive 
empirical support, it is theoretically consistent with the general 
laws of thermodynamics (Martyushev and Seleznev, 2006). If 
correct, the idea that MEPP is behind the behavior of every 
dissipative structure would explain away, rather than retain, 
the teleological conception of living beings (or so I want to 
hold in the next and final section). 


Discussion 

Living beings face a continuous flow of disturbances, both 
internal and external, and their behavior as dissipative 
structures is a constant return to the far-from-equilibrium 
condition where they exist. We see them constantly renewing 
the exchange of energy and matter with the environment, and 
tend to interpret this behavior as indicative of purposes or 
intrinsic teleology. However, as some studies have recently 
showed, ‘energy-seeking’ and adaptive behavior can equally 
appear in inert simple dissipative structures such as voltage- 
driven conducting beads in a viscous medium (Kondepudi, 
Kay and Dixon, 2015). The same thermodynamic principle, 
namely MEPP, applies to both nonliving and living dissipative 
structures and seems to account for what we take to be 
purposeful behaviors. These systems’ behavior, however, 
according to the argument presented here, represents just a 
different version, namely a nonequilibrium version, of a 
fundamental and ordinary physical phenomenon; stability. 
From pendulums to beads, from flames to living beings there 
seems to be a considerable and undeniable distance. Yet the 
distance, significant as it may be, seems to be a matter of 
(thermodynamic) degree, and not a matter of teleology. 
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Abstract 

The enactive AI framework wants to overcome the sense¬ 
making limitations of embodied AI by drawing on the bio- 
systemic foundations of enactive cognitive science. While 
embodied AI tries to ground meaning in sensorimotor inter¬ 
action, enactive AI adds further requirements by grounding 
sensorimotor interaction in autonomous agency. At the core 
of this shift is the requirement for a truly intrinsic value func¬ 
tion. We suggest that empowerment, an information-theoretic 
quantity based on an agent’s embodiment, represents such a 
function. We highlight the role of empowerment maximisa¬ 
tion in satisfying the requirements of enactive AI, i.e. estab¬ 
lishing constitutive autonomy and adaptivity, in detail. We 
then argue that empowerment, grounded in a precarious exis¬ 
tence, allows an agent to enact a world based on the relevance 
of environmental features in respect to its own identity. 

Introduction 

Enactive Artificial Intelligence (AI), as proposed by Froese 
and Ziemke (2009), represents a framework for the design 
and evaluation of artificial agents with the goal of foster¬ 
ing intentional agency and sense-making. It results from 
a critique of the embodied approach to AI (Pfeifer et al., 
2005), which was first embraced by Brooks (1991) to over¬ 
come several hard problems of good old-fashioned AI re¬ 
lated to sense-making, in particular the symbol grounding 
and frame problems. At the centre of Froese and Ziemke’s 
critique is the fact that embodied AI allows for an agent’s 
value function to be externally defined and controlled, which 
counteracts genuine intentional agency. Inspired by the bio- 
systemic foundations of enactive cognitive science, they re¬ 
quire agents to be genuinely intrinsically motivated in or¬ 
der to afford constitutive autonomy and adaptivity. The goal 
of this paper is to evaluate whether empowerment maximi¬ 
sation (Klyubin et al., 2008), a bio-inspired, information- 
theoretic candidate for intrinsic motivation, is sufficient for 
the realisation of enactive agents with intentional agency. 

We first present an overview of embodied AI and of how 
enactive AI wants to overcome it’s shortcomings. This is 
followed by an in introduction to empowerment, and an in- 
depth investigation of its role in constitutive autonomy and 


adaptivity. Crucially, we do not analyse whether enactive 
AI’s requirements are sufficient for intentionality and sense¬ 
making in artificial agents, but investigate whether they can 
be met by means of empowerment maximisation. 

Criticising Embodied Artificial Intelligence 

Situated and embodied cognition together with Enactivism 
represent three strongly interlinked theories in cognitive sci¬ 
ence. Situated cognition suggests that cognitive processes 
emerge from the interaction of an organism and its world, 
and are thus inseparable from action. Embodied cognition 
as defined by Rosch et al. (1992) emphasises the role of an 
agent’s physical body in shaping cognitive processes. Given 
that an agent’s body necessarily exists in some place, em¬ 
bodied cognition presupposes situatedness. The theories 
of embodied situated cognition are supported by a growing 
body of empirical evidence, highlighting how constraining 
bodily abilities of human participants can affect e.g. judge¬ 
ment and comprehension processes (cf. Strack et al., 1988; 
Gallagher, 2005; Havas et al., 2010). The theories are also 
supported by research in morphological computation (Za- 
hedi and Ay, 2013) and exemplified by “brainless robots” 
(Pfeifer et al., 2005), which perform otherwise computation¬ 
ally extensive tasks such as walking only by means of their 
bodily properties, e.g. the constraints and interplay of joints. 

Brooks (1991) was the first to bring the ideas from em¬ 
bodied situated cognition to AI research. Since then, em¬ 
bodied AI has developed into a mature framework for mod¬ 
elling artificial agents (cf. Pfeifer and Scheier, 2001), which 
stands in opposition to good old-fashioned AI with its em¬ 
phasis on the explicit manipulation of internal symbolic rep¬ 
resentations. We will outline the embodied approach to AI 
by reference to a selection of design principles suggested 
and argued for by Pfeifer et al. (2005). They were split into 
groups concerning the general design philosophy (P) and the 
actual design methodology (A). 

Embodied AI aims at gaining new insights in the general 
science of life and mind, as opposed to applied engineer¬ 
ing (P-1). Principle P-2 calls for a reduced designer’s in¬ 
fluence in order to create systems with emergent behaviour. 
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In the course of this, the designer will face a trade-off be¬ 
tween robustness and flexibility of the system (P-3). Prin¬ 
ciple P-5 adheres to the general theory in stating, amongst 
other things, that observed behaviour is neither reducible to 
an agent nor to its environment, and that seemingly complex 
behaviour can be triggered by simple internal mechanisms. 
This is where embodied AI differs most from its traditional 
counterpart. In actual design, agents should never be cre¬ 
ated in isolation, but with their environment in mind (A-l). 
Principle A-2 suggests that the proper study of intelligence 
requires a holistic perspective on agents instead of look¬ 
ing at sub-components only. Principle A-3 states that nat¬ 
ural intelligence does not come from algorithms in a central 
controller, but through the organisation of an agent’s sen¬ 
sorimotor loop. Consequently, A-4 says that cognition can 
be best understood as “appropriate sensorimotor coordina¬ 
tion” (Froese and Ziemke, 2009). Finally, the value princi¬ 
ple (A-8) requires the agent to be supplied with information 
about whether a certain action was good or bad, in order to 
motivate its behaviour. 

It is tempting to believe that embodied AI makes one of 
the biggest challenges of classic AI, the frame problem , ob¬ 
solete. Wheeler (2005) defines it as: 

Given a dynamically changing world, how is a nonmag- 
ical system (...) to take account of those state changes 
in that world (...) that matter, and those unchanged 
states in that world that matter, while ignoring those 
that do not? And how is that system to retrieve and (if 
necessary) to revise, out of all the beliefs that it pos¬ 
sesses, just those beliefs that are relevant in some par¬ 
ticular context of action? (Wheeler, 2005) 

Embodied and situated agents seem to resolve this prob¬ 
lem practically: by grounding cognition into their situated¬ 
ness in a continuously changing world, they do not need to 
refer to any internal representations. Nevertheless, Froese 
and Ziemke point out that the presence of a closed senso¬ 
rimotor loop only addresses the first part of the definition; 
what is missing is an agent’s own capacity to assign rele¬ 
vance to features of the world. They particularly criticise the 
value principle of embodied AI (A-8), which does not pre¬ 
clude the external assignment of such values or more general 
goals. More explicitly, they argue that the meaning problem 
cannot be resolved by injecting values externally, and crit¬ 
icise embodied AI for not demanding an intrinsic perspec¬ 
tive. It is a part of our goal to propose a practical solution to 
this challenge. 

The Enactive Approach to Artificial 
Intelligence 

The enactive approach to AI consequently roots in the ques¬ 
tion of how a system can be designed in which “relevant 
features of the world show up as significant from the sys¬ 
tem perspective itself, rather than only in the perspective of 


the human designer or observer” (Froese and Ziemke, 2009). 
Froese and Ziemke borrow ideas from enactive cognitive 
science, a theoretical framework which claims that cogni¬ 
tion is embodied, situated and grounded in practical activity. 
At its core is the idea that individuals do not passively create 
internal representations of a pre-given external world (Stew¬ 
art, 2010); instead, they actively generate meaning by con¬ 
structing their Umwelt (Von Uexkull, 1982), i.e. their very 
own world of significance, through interaction with the en¬ 
vironment. According to Rosch et al. (1992), features of the 
world are not independently out there, but enacted through 
an agent’s activity. 

Similarly, Froese and Ziemke argue teleologically that be¬ 
haviour can only be purposeful if it is significant from the 
system’s own perspective. To distinguish simple matter and 
most artificial agents which are incapable of such intrinsic 
concern from actual living beings, enactive cognitive sci¬ 
ence draws on Jonas’ notions of being by being , as opposed 
to being by doing (Jonas, 1982): while artificial systems can 
exist without actually doing anything, living systems estab¬ 
lish their systemic identity in reaction to the constant threat 
of becoming a non-being. The latter thus have a precarious 
existence, which is continually challenged by material or en¬ 
ergetic requirements. In order to react to threats, they must 
be able to assign significance to features of the world. 

Jonas suggests that this precarious existence is biologi¬ 
cally rooted in an individual’s self-organisation, as captured 
by the concept of autopoiesis. Introduced by Maturana and 
Varela (1987), autopoiesis represents a basic mode of iden¬ 
tity. The term only applies to physiochemical systems, and 
is generalised by the notion of organisational closure. A 
system implementing organisational closure is understood 
as a network of processes that generate and sustain its iden¬ 
tity under precarious conditions, and that form a unity in a 
containing domain. In their first design principle for fully 
enactive agents, Froese and Ziemke thus claim that intrinsic 
teleology requires organisational closure, or in other words, 
constitutive autonomy : 

EAI-1 (Constitutive autonomy): the system must be ca¬ 
pable of generating its own systemic identity at some 

level of description. (Froese and Ziemke, 2009) 

This intrinsic perspective represents the enactive version of 
embodied AI’s value function principle (A-8). In contrast to 
the synthetic methodology of the embodied approach, it re¬ 
quires the designer to establish the environmental conditions 
that allow for the emergence of a self-constituting system 
without direct design influence on the agent architecture. 

Although this principle affords a binary significance 
mechanism, Froese and Ziemke argue that it is not sufficient 
for sense-making as the enaction of an Umwelt, i.e. as the 
continuous evaluation of events in relation to maintaining 
the system’s identity. In order to enable an agent to improve 
its situation or to compensate for some encountered event, it 
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must be able to distinguish external events more gradually 
in terms of how they could affect its internal organisation. 
In other words, they require an agent’s Umwelt to not be 
merely black and white. The capacity to distinguish differ¬ 
ent tendencies towards non-existence, and to act on them in 
order to move away from a precarious situation is covered 
by the concept of adaptivity as defined in (Di Paolo, 2005). 
Additionally, an adaptive agent must be able to act upon its 
environment to prevent such precarious events in the future. 
The necessity of adaptivity for sense-making is covered by 
the second enactive design principle: 

EAI-2 (Adaptivity): the system must have the capacity 

to actively regulate its ongoing sensorimotor interac¬ 
tion in relation to a viability constraint. (Froese and 
Ziemke, 2009) 

This viability constraint can either be defined externally or 
be intrinsically related to the system’s identity. Neverthe¬ 
less, an external viability constraint would not conform with 
EAI-1. In summary, enactive AI complements and extends 
embodied AI’s approach to move sense-making into the sen¬ 
sorimotor loop, by grounding sensorimotor interaction in in¬ 
tentional agency (Froese and Ziemke, 2009). 

Empowerment as Intrinsic Motivation for 
Enactive Artificial Agents 

We suggest that empowerment maximisation, a principle in¬ 
troduced by Klyubin et al. (2005a), represents a promising 
candidate for a genuinely intrinsic value function in enac¬ 
tive AI. We will briefly provide the reader with an intuition 
and formal definition of empowerment and the principle of 
empowerment maximisation. We will then argue that em¬ 
powerment supports the formation of constitutive autonomy 
in enactive agents in both a synthetic and self-constituting 
manner, and fulfils the requirements for adaptivity without 
further modifications. 

Empowerment and Empowerment Maximisation 

Empowerment, the quantity underlying the maximisation 
principle, is defined over the relationship between an agent’s 
actuators and sensors, and as such is sensitive to the agent’s 
embodiment and Umwelt. It measures the influence of an 
agent’s actions on its environment (controllability), and the 
extent to which it can perceive this influence afterwards (ob¬ 
servability). In other words, empowerment quantifies the 
options available to an agent in terms of availability and vis¬ 
ibility; it measures how much potential influence an agent 
has on the world it perceives. Klyubin et al. (2008) introduce 
the principle, while Salge et al. (2014b) provide an extensive 
survey of motivations, intuitions and past research. 

At the centre of the empowerment definition is the in¬ 
terpretation of an agent’s embodiment as an information- 
theoretic communication channel. For any arbitrary separa¬ 
tion between an agent and a world we can define sensor vari¬ 


ables S and actuator variables A as those states that allow 
for the in- and outflow of information to the agent, respec¬ 
tively. This interaction with the world is usually described as 
a perception-action loop (Fuster, 2001; Touchette and Lloyd, 
2000, 2004) as in Fig. 1, which can be analysed by means 
of a causal Bayesian network and Pearl’s interventional cal¬ 
culus (Pearl, 2000). Here, arrows imply causation between 
random variables: the agent’s actions A only depend on its 
sensor input S , which in turn is determined by the rest of 
the system R. The latter is affected by the preceding sys¬ 
tem state and the agent’s actions. The interventional causal 
probability distribution p(S t +i\S t , A t ) thus represents the 
(potentially noisy) communication channel between actions 
and future sensor states. For simplicity, the interaction pre¬ 
sented here is discrete in time and space. Continuous imple¬ 
mentations exist, e.g. for robotics (cf. Salge et al., 2014b). 

Empowerment is then defined as the maximum potential 
information flow (Ay and Polani, 2008) that could possibly 
be induced by a suitable choice of actions, in a particular 
state s t . This can be formalised as the channel’s capacity: 
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Here, /(5*+i; A t ) represents the mutual information be¬ 
tween sensors and actuators, which is based on the differ¬ 
ence of regular H(S t + 1 ) and conditional Shannon (1948) 
entropy H(S t +i | A t ). The channel capacity is computed by 
finding the action distribution that maximises the mutual in¬ 
formation. Note that this distribution just defines what the 
capacity is, and is not the actual action policy. For more in¬ 
formation on these notions see (Cover and Thomas, 1991). 

Empowerment is local , i.e. the agent’s knowledge of 
the local dynamics p(S t +i\S t , A t ) is sufficient to calculate 
the quantity. The information-theoretic grounding makes it 
domain-independent and universal , i.e. it can be applied to 
every possible agent-world interaction, as long as this inter¬ 
action can be modelled as a perception-action loop. This im¬ 
plies that empowerment can be computed on arbitrary agent 
morphologies, and can cope with changes being made to it. 
Because the perception action loop can be applied to sub¬ 
systems (cf. Fuster, 2001), or to formalisations on different 
levels of abstraction (choosing a more or less fine grained 
model of what actions and sensors are), empowerment can 
also be applied to an agent on different hierarchical levels. 
Finally, empowerment is task-independent, i.e. it is not eval¬ 
uated in regard to a specific goal or external reward. 

Given that empowerment does not measure an agent’s ac¬ 
tual, but potential influence on the environment, an agent can 
choose its actions accordingly, in order to get into states with 
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Figure 1: Causal Bayesian network of a memoryless perception-action loop unrolled in time, with the agent’s sensors S, 
actuators A and the rest of the world R. 


maximum empowerment. The hypothesis behind this max¬ 
imisation principle suggests that in order to adapt to changes 
in their environment, living beings tend to keep their options 
open. In other words, in absence of more specific goals, 
they prefer states in which their actions have the strongest 
potential influence on the environment. More informally, an 
empowerment-driven agent wants to be in a state where its 
different actions would have different effects on the world, 
but it does not necessarily act out all options. This goes 
hand in hand with a second hypothesis, namely that evolu¬ 
tion favoured organisms with efficient information process¬ 
ing (Polani, 2009). Empowerment can thus be understood 
as one information efficiency principle focusing on the in¬ 
terplay of actuators and sensors. Based on the properties 
outlined in the previous paragraph, empowerment maximi¬ 
sation satisfies the criteria for an intrinsic motivation func¬ 
tion as suggested by Oudeyer and Kaplan (2008). 

Empowerment and Constitutive Autonomy 

The enactive AI framework suggests that, in order to gen¬ 
erate its own identity, a system must continuously maintain 
its precarious existence (Jonas, 1968; Froese and Ziemke, 
2009). Crucially, empowerment maximisation will not, of 
itself, bring about such a precarious existence. Neverthe¬ 
less, it allows for its maintenance and supports the process 
of second order engineering for the emergence of constitu¬ 
tive autonomy. In other words, agents which maintain their 
empowerment above zero realise organisational closure. To 
support this claim, we will first show that empowerment 
serves as a proxy for an agent’s internal organisation. 

A Proxy for Internal Organisation Given that a precari¬ 
ous existence is essentially conditioned on material and en¬ 
ergetic requirements, we suggest that an agent’s internal pro¬ 
cesses should maintain its ability to satisfy these require¬ 
ments. Consequently, an agent has to maintain its capacity 
to interact with the world by changing and observing it. Em¬ 
powerment quantifies this capacity; it is non-negative, con¬ 
tinuous, and becomes zero if an agent has no influence over 
the world it perceives. Given that maintaining the ability 
to interact should be the prime objective, we infer that zero 
empowerment will inevitably lead to disorganisation. As the 
internal processes are dependent on these energetic and ma¬ 
terial requirements, we also deduce that the organisation is 
impossible to recover without external support. An empow¬ 
erment value of zero therefore marks the viability boundary 


of an agent and serves as proxy for its internal organisation. 
It does not capture an agent’s precarious existence directly, 
but the extent to which this existence could be autonomously 
maintained by means of sensorimotor interaction. 

Empowerment does not distinguish whether it is the agent 
itself which looses coherence or its surrounding world. This 
is consistent with the theory of situated and embodied cogni¬ 
tion, which does not allow separation of the two in terms of 
their contribution to cognitive processes. In order to main¬ 
tain its existence, an agent has to keep both its internal pro¬ 
cesses and its surroundings organised, which is reflected in 
non-zero empowerment. Crucially, it is guaranteed to be¬ 
come zero if an agent’s precarious existence is lost from 
the point of view of autonomous regeneration, even if its 
internal organisation is still intact. This provides us with 
an alternative definition of death , which accounts for exter¬ 
nal forces. For instance, deactivating a robot would result 
in an empowerment value of zero, because there is nothing 
the robot could do in order to regain control over its senso¬ 
rimotor loop, which is in turn required to maintain its ex¬ 
istence. Consequently, an empowerment maintaining robot 
would try to hinder an external force from shutting it down. 

If the robot is deactivated nonetheless, the only option 
then is to rely on an external intervention to bring this capac¬ 
ity back. In a system where the internal organisation relies 
on a multitude of different active processes, the loss of some 
causes a chain reaction were others break down, leading to 
decay of the agent, as it is helplessly exposed to the entropy 
of the world. Seligman (1975) describes this situation in 
psychological terms, i.e. from a human perspective. In a 
classical robot, turning it off is usually not as problematic, 
as most current robots do not rely on the need to continu¬ 
ously maintain and repair their systems. Nevertheless, if a 
robot is not turned back on, entropic processes will eventu¬ 
ally obliterate the robot, leading to its information-theoretic 
death, i.e. a complete loss of organisation. In summary, an 
empowerment of zero marks death in terms of the inability to 
recover autonomously. This eventually leads to information- 
theoretic death, which cannot be reversed even by means of 
external intervention. 

Also note that in contrast to other homeostatic variables, 
such as a robot’s energy level, this equation between death 
and empowerment holds in both ways. A robot could be 
turned off whilst its energy level, an essential variable for its 
successful operation, remains high. But a robot cannot be 
turned off without its empowerment dropping to zero. 
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Second-Order Homeostasis An empowerment value 
over the viability boundary thus reflects an agent’s efforts 
to maintain its internal organisation and to sustain its influ¬ 
ence on the environment. This is the case even if we assume 
empowerment to be a meta-variable, i.e. if its maintenance 
is implemented via several homeostatic processes. Since 
keeping empowerment non-zero means to keep the agent’s 
internal organisation coherent, which in turn means to keep 
empowerment non-zero, we end up with a self-referential 
process. In other words, maintaining empowerment means 
preserving the capacity to maintain empowerment. It is 
this form of second order homeostasis that characterises au- 
topoiesis: “an homeostatic (...) system which has its own 
organisation (...) as fundamental variable which it maintains 
constant” (Maturana and Varela, 1980, p. 79). Since we are 
particularly interested in non-physiochemical systems, we 
make the more general claim that a system which keeps its 
empowerment non-zero realises organisational closure. 

Synthetic vs. Second Order Engineering An empower¬ 
ment maintaining system does not necessarily have to face 
precarious conditions; the latter must be specified or emerge 
from the agent’s dependencies on the environment. Never¬ 
theless, empowerment can serve as a meta-variable to inform 
the design of self-constituting agents both from a synthetic 
and emergent perspective. If we take the earlier, weaker 
stance of embodied AI and allow for some direct influence 
on the agent design, empowerment can be used as an ex¬ 
plicit intrinsic value function. Maintaining a more specific 
variable such as an agent’s energy level is not sufficient to 
ensure the coherence of the overall organisation, and thus 
does not suffice for organisational closure. If we assume 
empowerment to be implicitly implemented by several other 
variables, maintaining empowerment in turn means to main¬ 
tain all variables that are required to keep the organisation 
coherent. We thus adopt the claim that empowerment “might 
contribute to modulate pre-imprinted drives or help consti¬ 
tuting new homeostatic drives” (Klyubin et al., 2008). 

If we stick to strict second order engineering of emer¬ 
gence, empowerment can act as a primer to inform the envi¬ 
ronmental conditions required for the emergence of a self- 
constituting agent. More specifically, the environmental 
conditions must give rise to regulative processes that imple¬ 
ment dynamics similar to empowerment maximisation, in 
order to allow for the emergence of specialised homeostatic 
variables (Klyubin et al., 2008). As a meta-variable, empow¬ 
erment allows us to make less explicit assumptions about 
the specialised processes which must emerge from the envi¬ 
ronment to constitute and maintain an agent’s identity, and 
yet remains specific enough to enable a more directed pro¬ 
cess. Counterintuitively, designing the environment in a way 
that affords the emergence of an empowerment maintaining 
agent thus allows for more, not less, freedom in emergence 
and is therefore in sync with the enactive AI principles. 


A Sufficiently Intrinsic Value Function Although em¬ 
powerment is not a truly emergent property in this con¬ 
text, we argue that it is still sufficiently intrinsic to sat¬ 
isfy Froese and Ziemke’s requirements for an agent’s value 
function. It is local, and domain-independent through its 
information-theoretic grounding. This also makes it inde¬ 
pendent from any sensory semantics, a criterion brought for¬ 
ward by Oudeyer and Kaplan (2008). Embedded in the ar¬ 
chitecture of a minimal agent with a precarious existence, 
empowerment becomes grounded in the maintenance of its 
identity. Calculating empowerment either explicitly or im¬ 
plicitly then translates to assigning genuine relevance to fea¬ 
tures of the environment. 

Empowerment and Adaptivity 

We have demonstrated that keeping empowerment non-zero 
already satisfies a minimal form of adaptivity in terms of 
maintaining a precarious existence. We will show that 
this mechanism represents an abstraction of empowerment 
maximisation , a principle which naturally emerges from an 
agent’s need to optimise the efficiency of its sensorimotor 
interaction. Crucially, empowerment maximisation realises 
adaptivity without adding additional complexity, e.g. more 
layers to an agent’s architecture. 

Distinguishing Viability Tendencies Di Paolo (2005) de¬ 
fines adaptivity as a system’s capacity to regulate its states 
and its relation to the environment with the result that: 

1. Tendencies are distinguished and acted upon de¬ 
pending on whether the states will approach or re¬ 
cede from the boundary and, as a consequence, 

2. Tendencies of the first kind are moved closer to or 
transformed into tendencies of the second and so fu¬ 
ture states are prevented from reaching the boundary 
with an outward velocity (Di Paolo, 2005). 

By quantifying the efficiency of the perception-action 
loop for different reachable sensor states, empowerment al¬ 
lows the agent to identify states that afford it more options 
relative to its sensorimotor equipment. Given the link be¬ 
tween the agent’s internal organisation and the efficiency 
of its sensorimotor loop, empowerment allows the agent to 
distinguish tendencies in the environment in terms of how 
they could potentially affect its viability, which satisfies Di 
Paolo’s first requirement. 

We want to stress that the agent does not need to possess 
a “viability set” in Di Paolo’s sense, i.e. different degrees or 
different forms of disorganisation above its viability bound¬ 
ary. Unlike the value function in embodied AI (A-8), em¬ 
powerment is future-directed and can therefore differentiate 
genuine tendencies in terms of action affordances that might 
have an impact on an agent’s viability, even if there is no 
actual robustness in the agent. 
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As a necessary requirement for real-world scenarios, its 
information-theoretic foundation enables it to cope with un¬ 
certainty in the sensorimotor loop. Anthony et al. (2008) 
show that empowerment allows an agent to extract and use 
local information to learn about the world’s global structure. 
An agent which improves empowerment locally in terms of 
time and space is thus likely to improve globally as well. 

Transforming Viability Tendencies Using empowerment 
maximisation as an action policy allows an agent to prevent 
states which might prove fatal, and to prefer those which 
might be beneficial. In a simulation study, we demonstrated 
that empowerment maximising agents were able to main¬ 
tain their precarious existence even under serious energy re¬ 
source constraints (Guckelsberger and Polani, 2014). With 
empowerment becoming zero when the agent has no sen¬ 
sorimotor control, there is no need to explicitly define a 
death state , and empowerment maximisation naturally leads 
to death avoidance behaviour. We conclude that empower¬ 
ment maximisation fulfils Di Paolo’s aforementioned, sec¬ 
ond requirement for adaptivity in terms of sensorimotor co¬ 
ordination (Di Paolo, 2005). 

Several studies have investigated how empowerment 
maximisation can facilitate sensorimotor adaptation. Klyu- 
bin et al. (2005b, 2008) show that empowerment can serve as 
an immediate guide for sensor and actuator evolution during 
an agent’s lifetime. They have used empowerment as the 
fitness function in a genetic algorithm to evolve both sen¬ 
sors and actuators, while constraining the agent’s informa¬ 
tion processing bandwidth. This empowerment maximisa¬ 
tion strategy yielded sensors and actuators of different qual¬ 
ities which were “meaningful” in respect to the agent’s cur¬ 
rent state. This is possible because empowerment is not 
only well defined for different agent morphologies, but even 
makes these morphologies comparable in terms of which is 
the better fit for a given environment. 

The information-theoretic nature of empowerment allows 
for a less-biased and thus pro-enactivist approach to senso¬ 
rimotor adaptation, because it does not rely on any assump¬ 
tions about sensory modality (Oudeyer and Kaplan, 2008; 
Salge et al., 2014b). Due to its grounding in the sensorimo¬ 
tor loop, empowerment can potentially be used to modify the 
environment (Salge et al., 2014a), the agent’s morphology, 
and its sensors and actuators (Klyubin et al., 2008). Hence it 
also satisfies Di Paolo’s requirement for an agent to regulate 
not only its states , but also its relation to the environment. 

Given the evidence above, we conclude that empower¬ 
ment maximisation satisfies Di Paolo’s requirements for 
adaptivity. It even exceeds them in that it allows for sen¬ 
sorimotor coordination and adaptation not just in “some cir¬ 
cumstances”, as Di Paolo (2005) requires, but in a perma¬ 
nent fashion. An empowerment maximising agent not only 
acts when there is a disaster, but continuously optimises its 
mastery of the sensorimotor loop. If the empowerment gra¬ 


dient is less steep, empowerment allows for more freedom 
in the selection of actions. 

Discussion 

We have claimed that an empowerment maintaining agent 
can be considered as implementing organisational closure. 
Nevertheless, we have not yet demonstrated that it meets 
Maturana’s and Varela’s second requirement for autopoiesis, 
namely to constitute itself “as a concrete unity in the space 
in which the components exist (...)” (Maturana and Varela, 
1980, p. 79). Froese and Ziemke point our that there is 
no mechanism available yet to test for this criterion in non- 
biophysiological systems (Froese and Ziemke, 2009). Thus, 
our argument so far is based on the assumption that such a 
boundary has been somehow established; and we demon¬ 
strated how empowerment scales, i.e. that it can be ap¬ 
plied to an arbitrary chosen boundary since it is defined on 
any possible morphology. It is unclear though whether this 
boundary is maintained for an empowerment maximising 
agent emerging from second order engineering. 

Empowerment maximisation overcomes Wheeler’s 
“intra-context frame problem” Wheeler (2008), i.e. a 
system’s challenge to act appropriately and flexibly in a 
given context, by assigning potential future states relevance 
relative to its identity. Nevertheless, in order to maximise 
empowerment, an agent must infer not only potential 
future sensor states, given the current state, but also its 
action consequences in these possibly remote states. The 
obvious question arising from this is whether computing 
empowerment, or more broadly speaking, behaving as 
if one was maximising the empowerment, would require 
an explicit forward model. Most existing work assumes 
a somewhat acquired world model that can be queried 
(Salge et al., 2014b) but more recent work argues that a 
neural network can be trained to act as if it was maximising 
empowerment, without an explicit forward model, based 
only on past experience (Mohamed and Rezende, 2015). 
In any case it should also be noted that the formalism 
only requires an agent-centric understanding of the local 
dynamics p(S t +i\S t , A t ) based on a level of “representa¬ 
tion” consistent with the idea of sensorimotor contingencies 
(O’Regan and Noe, 2001), i.e. an understanding of the 
regularities of the agent’s own sensorimotor loop. 

Revisiting enactive AI’s design principles through the lens 
of empowerment yields that they cannot be as clearly sepa¬ 
rated as Froese and Ziemke suggest; there must be an im¬ 
plicit value function already in place to maintain the consti¬ 
tutive autonomy of an agent. Adaptivity could resort to the 
same value function, if the latter is powerful enough to dis¬ 
tinguish different viability tendencies. This is the case for 
empowerment, which scales seamlessly across both require¬ 
ments without further modifications. 

Our investigations also shed light on the issue of robust¬ 
ness: while Froese and Ziemke take physical robustness, i.e. 
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the existence of a set of non-fatal events, for granted in au- 
topoietic systems, we believe that in the realm of artificial 
agents, we must allow for systems which can disintegrate 
in an instant. One might argue that this does not allow for 
an Umwelt to be constituted, but this is only correct if we 
think of a value function in embodied AI’s terms, determin¬ 
ing “whether an action was good or bad”. Empowerment 
as a future-directed motivational function in turn allows an 
agent to distinguish genuine tendencies of states to impact 
its organisation in a positive or negative way. A stochastic 
system allows for the emergence of such tendencies without 
the need for the agent to have actual physical robustness. For 
instance, consider an agent moving across a narrow bridge 
under windy conditions. Even if the agent only had a binary 
viability set, it could consider a position at the bridge’s edge 
as more risky, since the likelihood of being blown away is 
higher, which would eventually render the agent unable to 
act. Given that such tendencies allow agents to assign rele¬ 
vance to features of the world, i.e. to construct an Umwelt, 
we suggest that adaptivity is not absolutely necessary for 
sense-making. Nevertheless, we agree that it is extremely 
useful in order for agents to improve and compensate. 

Conclusion 

We have demonstrated that empowerment satisfies the re¬ 
quirements for enactive AI, i.e. constitutive autonomy and 
adaptivity. We approached these requirements separately, 
and suggested empowerment as an implicit or explicit, but 
genuinely intrinsic value function which overcomes the lim¬ 
itations of embodied AI. In particular, we argued that sus¬ 
taining empowerment is a self-referential process, and that 
empowerment-driven agents are thus autopoietic. 

We demonstrated that empowerment maximisation can¬ 
not afford a precarious existence itself, but represents a 
generic mechanism which ensures the maintenance of such 
an existence. We believe that empowerment can be realised 
by means of more specialised variables, or lead to the for¬ 
mation of such variables. By describing how empowerment 
could support the process of second order engineering for 
the emergence of constitutive autonomy, we also want to 
stress its potential role as a mediator between the synthetic 
methodology of embodied AI and the strict ideas of emer¬ 
gence in enactive AI. 

When embedded into an agent with a precarious exis¬ 
tence, empowerment will be grounded in the maintenance of 
its identity. If we take Froese and Ziemke’s claims seriously, 
we can thus assume that the relevance which empowerment 
assigns to states of the world represents genuine concern. 
We showed that the principle of maintaining empowerment, 
as required for constitutive autonomy, is simply a special 
case of maximising it. Additional layers in an agent’s archi¬ 
tecture therefore become obsolete: empowerment maximi¬ 
sation represents a mechanism which satisfies the conditions 
for adaptivity and thus allows an agent to regulate it states 


and its relation to the environment to move away from its 
viability boundary. 

Froese and Ziemke developed the framework of enactive 
AI to advance intentional agency and sense-making in ar¬ 
tificial agents, and suggest that their requirements represent 
necessary, although potentially not sufficient conditions. We 
argue that the second requirement of adaptivity is actually 
not necessary for sense-making, but extremely useful for the 
constitution of advanced behaviour and a robust identity. Al¬ 
though they want to move away from carbon chauvinism and 
Dreyfus’ requirement to reproduce living agents in detail 
(cf. Dreyfus, 2007), their examples in (Froese and Ziemke, 
2009) are largely simulations of biochemical processes. We 
believe that minimal agents motivated by an appropriate in¬ 
trinsic motivation, such as empowerment, can serve as an 
inspiring abstraction, which could particularly support the 
selection of environmental conditions in second order engi¬ 
neering of emergence. 
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Abstract 

This paper presents a bottom-up approach to machine ethics, 
based on the Measurement Logic Machine (MLM). It is ex¬ 
plained how ethical notions emerge from the workings, archi¬ 
tecture, and environmental assumptions of the MLM frame¬ 
work. The MLM uses sequences of measurements to perform 
short-term predictive inference. The MLM ethical behav¬ 
ior stems from the inner evaluation of measurements that are 
used to filter the predictions. The MLM ethical discernment 
is based on measurements that detect immediate suffering in 
other agents. Also, a definition of what is an ethically positive 
modification of the inner evaluations is proposed, based on 
the notion of environmental intelligence and the correspond¬ 
ing notion of suffering. It is shown how this double approach 
is consistent with our intuitive notion of ethics. The MLM, 
with or without ethical discernment, can be used in evolu¬ 
tionary game theory, and gives clues to the search of ethical 
senses that increase the chances of survival of autonomous 
agents. 

Introduction 

Ethics (or morale) try to answer the question of what actions 
are right or wrong in specific circumstances. The increas¬ 
ing interaction between men and autonomous machines may 
soon require these machines to be correctly constrained by 
ethical criteria (see, for instance, Trappl (2015) and Bostrom 
(2014)). An obvious concern is the future availability of 
lethal autonomous weapons. These machines need ethical 
discernment , the ability to distinguish right from wrong. It’s 
an interdisciplinary research topic, related not only to arti¬ 
ficial intelligence, but also to philosophy, psychology and 
anthropology. 

Top-down approaches implement moral algorithms from 
explicit theories of moral behavior. Bottom-up approaches 
attempt to train or evolve agents so they will emulate cor¬ 
rect (from a human perspective) ethical behavior (see, for 
instance, Allen et al. (2005)). 

Much progress has been made to create tools that model 
the human understanding of what is right and what is wrong. 
Advances in logic programming provide techniques to han¬ 
dle ethical dilemmas (see Pereira and Saptawijaya (2016)). 
This approach relies on a prior understanding and a correct 


formalization of the situations that lead to ethical choices. 
It’s a top-down approach to ethics. A great advantage of 
the logic programming approach is that the logical reason¬ 
ing can be traced to explain how a moral choice was made. 

Deep learning may provide an alternative bottom-up so¬ 
lution to the implementation of machines with ethical dis¬ 
cernment. In the deep learning framework, machines learn 
their behavior from a massive amount of examples. This 
approach has been effective in many areas. A remarkable 
recent achievement is AlphaGo, that plays the game of Go 
at professional level (see Silver et al. (2016)). An inconve¬ 
nient of neural network machines is that they cannot explic¬ 
itly justify their choices. Also, creating large training sets 
for ethical situations is a difficult task. 

When put at work in a real human environment, any au¬ 
tonomous agent with ethical discernment needs first to iden¬ 
tify the relevant information that must be presented to the 
logic program (or the input neurons of the deep-learning ma¬ 
chine). This problem is far from solved. Here we shall as¬ 
sume that it can be solved. 

The aim of this paper is to identify some basic features of 
an autonomous learning machine that displays ethical dis¬ 
cernment. It’s an autonomous bottom-up approach, in the 
sense that it does not rely on supervised training to achieve 
ethical behavior. Even the ethical concepts are defined from 
the workings and the architecture of the machine. 

The autonomous machine here considered is the Mea¬ 
surement Logic Machine (MLM). The MLM is a fast learn¬ 
ing machine that learns from small amounts of sequential 
data. It’s adequate for simple short-term inference in non¬ 
stationary environments. In the next sections, we shall first 
briefly explain the MLM, and how it implements the idea of 
ethical choices. A case study of cooperation in the Iterated 
Prisoner’s Dilemma will then be presented, along with fur¬ 
ther details of the MLM workings. Finally, some ideas are 
proposed for the evaluation of ethics at a broader level, how 
the MLM can be used in evolutionary game theory, and the 
possible nature of ethical senses. 
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The Measurement Logic Machine 

The Measurement Logic Machine (MLM) (see Castro 
(2008, 2010, 2011, 2013)) is a general fast learning frame¬ 
work that addresses the survival problem in a hostile and 
non-stationary world. It assumes the agent is a fragile open 
system that can starve or be destroyed. The MLM reinforce¬ 
ment learning mechanism is related to online learning, since 
it gets data as it interacts with the hostile environment. A 
good introduction to online learning can be found in Blum 
(1998). Recent advances in fast online learning algorithms, 
and their performance when playing against humans, are dis¬ 
cussed in Ishowo-Oloko et al. (2014). 

The MLM source code can be found at the author’s site 
https://sites.google.com/site/josefgfcastro, in the “Python 
3.4.3 (Anaconda) Source Codes” page. The reader is en¬ 
couraged to download the source code, and try the different 
iterated games that are implemented there. The MLM broad 
effectiveness while playing very different games demon¬ 
strates the generality of the MLM approach. 

The basic functional structure of the MLM is shown in 
Fig. 1. The MLM is equipped with sensors that allow mea¬ 
surements to be made. A measurement is a recorded an¬ 
swer for some physical question. A physical question is a 
specific experimental setting, defined by the sensors used 
and the signal processing made. The measurements detect a 
few relevant features from the world, along with the MLM 
own actions. The sequence of the most recent measurements 
is constantly updated in a short-term memory (STM). The 
MLM has no notion of a outer world being measured. For 
the MLM, measurements are all there is. 


Sensor and 
Measurements 


STM 


LTM 
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Partial Match 


Prediction 

Fettering 


Choice (Action) 


Figure 1: The MLM Basic Functional Structure 

Similar to any living entity, the MLM concept assumes 
a skin that separates the outer world from the inner world. 
The STM sequences interleave the outer world and the inner 
world measurements. The MLM own actions are measured 
during the inner world step. The MLM measures its own 
actions after they were chosen and performed. 

The MLM accumulates experience as it stacks some of 
the STM sequences in a long-term memory (LTM). The 


LTM content is then used to generate predictions and poli¬ 
cies from the current situation held in the STM, based on 
the partial match of the current and past sequences. To find 
a match, a linear search of the LTM stack is performed, from 
top to bottom. 

In broad terms, the MLM prediction mechanism can be 
compared to the “predicting from expert advice” online 
learning methods (see Blum (1998), Crandall (2014a) and 
Crandall (2014b)). In the MLM case, the “experts” are the 
STM sequences recorded in the LTM. The most distinctive 
feature of the MLM approach is the absence of an initial 
set of “experts”. The MLM generates its “experts” as it ex¬ 
plores the environment. Also, only the relevant “experts” 
that match a given situation (the “specialists”) are consulted 
at each measurement-action step. A linear search of the 
LTM, from top to bottom, is used to find a promising and 
reliable prediction. The prediction indicates the next action 
to perform. If no prediction is found, an action is randomly 
selected from the set of available actions. 

To be of any value, a prediction must be reliable. This 
is achieved bringing gradually to the top of the LTM stack 
the measurement sequences (the “specialists”) that provided 
correct predictions, and pushing down the sequences that 
provided wrong predictions. After a while, the first “spe¬ 
cialist” found is also the most reliable. In each MLM, a 
maximum size is defined for the LTM. Sequences that are 
pushed down beyond the LTM maximum size are forever 
erased. This elimination is important when the linear search 
repeatedly stumbles on a “specialist” that offers promising 
but unreliable predictions, blocking further exploration. 

Although reliable, predictions can still lead to the agent’s 
(or specie’s) destruction. The adopted predictions must be 
not only reliable, but also promising in terms of survival. 
This is why the MLM predictions are filtered before be¬ 
ing adopted, using an inner evaluation. The inner evalu¬ 
ation makes a partition of the individual measurements into 
“good” and “bad” sets (possibly with a few gradations). This 
“good”/“bad” partition is fully arbitrary. It does not assume 
any background notion of what is good and bad from a hu¬ 
man perspective, and can even be unlinked to any notion 
of pleasure or pain. To decide what to do next, the MLM 
uses the first prediction found that is evaluated as globally 
“good”. If no “good” prediction is found, the machine tries 
a random action. By definition, an adequate inner evaluation 
leads to choices of sequences of individual world states and 
actions that consistently promote the agent’s (or specie’s) 
survival. An adequate inner evaluation leads to adequate fil¬ 
tering, and adequate filtering leads to adequate actions. A 
set of co-adequate inner evaluations for a given problem can 
be found placing several MLMs in a evolutionary setting. Of 
course, any particular choice of inner evaluations faces the 
“No-Free-Lunch” dilemmas (see Wolpert (1996)). 

An interesting consequence of the MLM learning mecha¬ 
nism is the “superstitious learning” phenomenon (see Skin- 
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ner (1948)). When the MLM finds regularly (within the time 
range of the STM) some “good” state, it will repeat the ir¬ 
relevant sequence of actions that precede that state. 

The MLM learns fast (if there is anything to learn) be¬ 
cause it sticks to the first reliable and “good” predictions 
found, without any concern for optimization. Since it starts 
with an empty LTM, the initial random exploration will 
greatly influence subsequent behavior. 

One of the nice features of the MLM architecture is the 
ability to provide objective meanings to otherwise vague hu¬ 
man concepts. If we take the STM most recent measure¬ 
ment, we can ask what was the item preceding it. Since 
the question and the answer are found in the same recorded 
sequence (i.e. the STM), we say that the answer for that 
question is known. If we ask, on the contrary, what is the 
item that follows the STM most recent measurement, the an¬ 
swer can no longer be found in the STM. We need to consult 
the LTM memories to find a justified answer (in this case, a 
prediction), based on prior experience. The answer for that 
question is therefore a justified belief. As a general principle, 
beliefs are generated when the question and the answer are 
found in different memory structures. Knowledge and belief 
are thus operational concepts that emerge from the workings 
of the MLM. This allows a bottom-up approach to episte¬ 
mology. Notice, for instance, that the MLM actions can be 
known only after they were performed and measured. This 
understanding brings some clarity to the free-will debate that 
was started with Libet experiments (see Libet et al. (1983), 
and also the recent study in Schultze-Kraft et al. (2016)). We 
shall use the same bottom-up approach to define in the next 
section some relevant ethical concepts. 

MLM Filtering and Ethical Discernment 

The skin of an agent separates its inner world from its outer 
world. For each agent, the word others refers to the outside 
world entities that are inside other skins. Here we shall as¬ 
sume that the notion of ethics is related to the suffering of 
others as a result of one’s actions. 

An ethical agent needs moral agency. Moral agency 
means the agent is able to predict the consequences of its 
actions, give a moral evaluation to these consequences, and 
chose its actions accordingly. Moral agency and freedom to 
act are distinct concepts. Even when prevented from imple¬ 
menting its choices, the ethical agent still keeps its moral 
agency. 

We shall assume that an ethical choice is not about “love 
thy neighbor”, but rather about “love thy neighbor as thy¬ 
self '”. To define suffering in a game with a payoff table, the 
individual immediate suffering is measured by the amounts 
lost by the individual players in a single turn. Another def¬ 
inition, a broader and non-immediate notion of suffering, 
shall be proposed later. 

With the assumption that an immediate loss means imme¬ 
diate suffering, a first necessary condition for ethical dis¬ 


cernment is the ability to measure the opponent’s losses, 
along with its own losses. These we call the ethical mea¬ 
surements. If the outer world measurements only capture 
the agent’s individual gains and losses, the machine is self- 
centered by construction. Unable to perceive the gains and 
losses of others, the machine is essentially non-ethical. But 
its actions can still be seen as right or wrong by some exter¬ 
nal observer, according to the immediate suffering observed. 
The external observer can even describe the machine’s be¬ 
havior as “selfish” or “deceptive”. These words are some¬ 
what misleading, but they can be found in popular descrip¬ 
tions of robotic behavior (see, for instance, the news related 
to the article Mitri et al. (2009)). 

The MLM acts based on filtered predictions. The MLM 
predictions are filtered according to the inner evaluation of 
states and actions. This inner evaluation is a second nec¬ 
essary condition for ethical behavior. It allows filtering out 
the continuations that represent a predictable loss for the op¬ 
ponent, or itself. But it needs not be so, because the inner 
evaluations may qualify as “good” the opponent’s suffering. 

The filtering of predictions, based on the inner evaluation 
of ethical measurements, defines the machine’s ethical na¬ 
ture. It becomes possible to start talking of machines with 
inner evaluations that are “kind” or “mean” towards other 
machines. A machine that stops cooperation, taking advan¬ 
tage of the other machine’s cooperation in the iterated pris¬ 
oner’s dilemma, while being able to sense and predict the 
other machine’s suffering, is indeed “mean”. 

Reinforcement learning requires that the inner world mea¬ 
surements include information about the agent’s own ac¬ 
tions. A shortcut to implement actions that are seen by an 
external observer as ethically correct (even with non-ethical 
agents), is to directly include in the filtering of predictions 
an inner evaluation of the agent’s own actions. Assigning a 
“good”/“bad” evaluation to each of the machine’s possible 
actions is called an action inner evaluation. The great ad¬ 
vantage of using an action inner evaluation is the possibility 
to select predictions that actually promote the agent’s sur¬ 
vival, but that would be filtered out, if only the outer world 
measurements were considered. We shall see an example of 
this in the next section. 

A Case Study: The Iterated Prisoner’s 
Dilemma 

The Iterated Prisoner’s Dilemma (IPD) is a two player game. 
A game consists of series of simultaneous choices, the ac¬ 
tions taken by the two players. At each turn, the players 
have two possible choices: Cooperate (C) or Defect (D). The 
objective payoff matrix is presented in Table 1. 

The table indicates pairs of payoffs, with the first pay¬ 
offs referring to the player whose choices are given at the 
left side of the table (called the first player). The second 
payoffs refer to the player whose choices are given at the 
upper side of the table (called the second player). The con- 
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C 

D 

c 

R; R 

S; T 

D 

T; S 

P; P 


Table 1: Prisoner’s Dilemma Payoff 

dition T > R > P > S defines the prisoner’s dilemma payoff 
structure. The “temptation” reward T is better than the mu¬ 
tual cooperation reward R. The mutual cooperation reward 
is better than the mutual defection reward P, but worse than 
the “sucker” payoff S. The additional iterated game condi¬ 
tion 2R > T+S assures that the alternating cooperation and 
defection is worse than mutual cooperation. 

A MLM prediction for the IPD is a sequence of outer 
world measurements that detect the states resulting from the 
player’s prior choices, interleaved with the inner world mea¬ 
surement of actions. The measured outer world states are 
tagged “CC”, “CD”, “DC”, and “DD”, with the first letter 
describing the action of the player that is left of the table. 
The measured inner world actions are tagged “C” or “D”. 
Notice that the quote marks have been used to indicate that 
we are talking about inner MLM tags. The MLM does not 
seek information from the shape of the inner tags. For in¬ 
stance, the shape of the tags cannot be used to implement 
a Tit-For-Tat (TFT) strategy. All tags are atomic, and their 
meaning is purely relational. By construction, the MLM has 
no symbol grounding problem. 

Notice that these MLMs have no ethical measurements, 
or ethical discernment. They know nothing about the gains 
and losses of the other player. Here we shall just examine 
how different inner evaluations can generate (or not) mutual 
cooperation in the IPD. 

To filter predictions according to the payoffs, the MLM 
must assign them a subjective evaluation. Since there are 
four different payoffs, we can use just four inner evaluation 
tags - “verygood”, “good”, “bad”, “verybad” - and assign 
them to the measured outer world states “CC”, “CD”, “DC”, 
“DD”. The first player inner evaluations are shown in Table 
2. The table for the second player can be obtained with the 
permutation of “DC” with “CD”. 


World 

Payoff 

Evaluation 

“DC” 

T (temptation) 

“verygood” 

“CC” 

R (mutual cooperation) 

“good” 

“DD” 

P (mutual defection) 

“bad” 

“CD” 

S (sucker) 

“verybad” 


Table 2: First Player Payoffs Evaluation (STANDARD) 

To obtain the global evaluation of a prediction, we elimi¬ 
nate the pairs “verygood”/“verybad” and “good”/“bad” that 
appear in it. Also, two “good” (“bad”) evaluations will can¬ 
cel a single “verybad” (“verygood”) evaluation. Whatever 
remains, after all the canceling is performed, is the global 


evaluation of the prediction. The MLM thus works with a 
very primitive number sense that coarsely reflects the payoff 
structure. For a prediction to be selected, at least one “good” 
or “verygood” label must remain. 

As explained above, learning occurs when the LTM se¬ 
quence that was used to generate a prediction is pulled up or 
pushed down in the LTM, according to its predictive correct¬ 
ness. The inner evaluation of the current outer world mea¬ 
surement is used to tune how fast the sequences are moved 
up or down inside the LTM stack. For instance, a correct 
prediction of a “verygood” state will pull up in the LTM the 
corresponding “specialist” twice as fast as the correct pre¬ 
diction of a “good” state. 

If the first machine plays with the inner evaluations of Ta¬ 
ble 2 and the second player with its “CD”/“DC” permutation 
(we shall call these the STANDARD evaluation tables), mu¬ 
tual cooperation is still possible. The MLM does not attempt 
to maximize payoffs, and so mutual cooperation may arise 
from the random exploration that occurs when no “good” 
predictions are found. But this mutual cooperation is fragile 
in the presence of noise. 

The STANDARD evaluation is adequate playing along a 
fixed Tit-For-Tat (TFT) strategy, or along a Win-Stay-Lose- 
Change (WSLC) fixed strategy. In both cases, mutual coop¬ 
eration dominates, even in the presence of noise. 

In the IPD implementation, the four measured outer world 
states (“CC”, “CD”, “DC”, “DD”) discriminate the consoli¬ 
dated gains and losses of both machines. It’s easy to assign 
an inner evaluation that reflects the structure of the consoli¬ 
dated payoffs of both agents (2R, 2P, and T + S). We know 
that 2R > 2P, and that 2R > T + S, and so we only have 
a partial order. The simplest way to define the inner evalua¬ 
tions for the first player is shown in Table 3. Let us call it the 
KIND evaluation. As before, the KIND table for the second 
player is obtained with a “CD”/“DC” permutation. 


World 

Payoff 

Evaluation 

“DC” 

T (temptation) 

“bad” 

“CC” 

R (mutual cooperation) 

“verygood” 

“DD” 

P (mutual defection) 

“bad” 

“CD” 

S (sucker) 

“bad” 


Table 3: First Player Payoffs Evaluation (KIND) 


Notice that, although we call KIND this inner evaluation, 
there is no “kind” ethical nature in this MLM. It’s a non- 
ethical machine, because it lacks a sensor to measure the 
suffering of the other player. 

If both machines adopt the KIND evaluation, mutual co¬ 
operation soon arises, even in the presence of significant 
noise. The evaluation is adequate for both machines, and 
brings the best consolidated payoffs, since 2R > T+S and 
2R > 2P. 

If one of the machines keeps the STANDARD evaluation 
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of Table 2, the KIND evaluation of Table 3 becomes clearly 
inadequate. This shows that the adequacy of an evaluation 
is always contextual. Even in the STANDARD vs KIND sit¬ 
uation, a fragile mutual cooperation can still arise, since the 
KIND machine will play randomly most of the time, unable 
to find a “good” prediction. 

Naturally, the KIND evaluation is adequate playing along 
a TFT or WSLC strategy. Cooperation follows, and is robust 
in the presence of noise. 

Instead of the KIND evaluation of Table 3, it’s possible to 
keep both machines playing with the STANDARD evalua¬ 
tion, and add a “verybad” evaluation to action “D”, as shown 
for the first player in table 4. Let us call it the A-KIND 
evaluation. The “verybad” tag of the action “D” cancels the 
“verygood” tag of state “DC”, which is the best possible sit¬ 
uation that can follow action “D”. As a result, predictions 
with action “D” tend to be filtered out. 


World 

Payoff 

Evaluation 

“DC” 

T (temptation) 

“verygood” 

“CC” 

R (mutual cooperation) 

“good” 

“DD” 

P (mutual defection) 

“bad” 

“CD” 

S (sucker) 

“verybad” 

“D” 


“verybad” 

“C” 




Table 4: First Player Payoffs/Actions Evaluation (A-KIND) 

As before, if the A-KIND evaluation is shared by both 
players, mutual cooperation will soon arise, even in the pres¬ 
ence of a large amount of noise. A-KIND is not adequate 
playing along STANDARD, but is adequate playing along 
TFT. 

Environmental Evaluation of Ethics 

We saw how cooperation arises with two MLMs playing the 
IPD. Let us now see how we can assign a broader objec¬ 
tive measure to the notion of ethics. We wish that notion to 
be consistent with the idea that cooperation is a good thing 
while playing the IPD. 

The MLM works with a sequence of measurement-action 
steps. Each MLM is assumed to be an open and fragile entity 
that can die: When placed in a hostile environment, it can 
starve or be destroyed. To measure the MLM performance, 
let us use the notion of environmental intelligence (/). This 
notion of intelligence takes into account the environment’s 
hostility. 

To measure the environments hostility (77), we count the 
number of times r the MLM was rescued (i.e. restored from 
death, keeping its LTM past experience), while acting ran¬ 
domly (i.e. with its available actions randomly chosen, with 
an uniform distribution), and divide it by the number s of the 
corresponding measurement-action steps performed: 

H= r - 

s 


The value of s must be large enough to provide a reliable 
evaluation of H. 

To find the environmental intelligence of the same (but 
now fully working) MLM, we count the number of rescues 
p for the same number 8 of measurement-action steps. The 
/ score is given by: 

j = r-p 
s 

Therefore I measures how much better (or worse) the 
MLM is, when compared to its randomized version, in a 
given hostile environment. Notice that, when a given MLM 
fully avoids destruction in a more hostile environment, it 
scores a larger I in that environment. This the reason for 
the name “environmental intelligence”. 

It was assumed above that the notion of ethics is related 
to the suffering of others. Let us now define the suffering S 
of a MLM (in its fully working mode) as the value: 

s = p - 

s 

This means that I = H — S. The MLM achieves maxi¬ 
mum intelligence when it totally avoids suffering. This no¬ 
tion of suffering also requires a large enough value of s. In 
this sense, it’s a much slower measurement than immediate 
suffering, and we shall call it “slow” suffering. 

When several MLM are placed together, let E be the bag 
(i.e. the multiset) of their inner evaluations. We can just add 
the individual I scores to get a global score Ie- 

For a given bag of MLM, a modification of their inner 
evaluations is noted E E'. By definition, an ethically 
positive modification e + (E E') is a modification that 
increases the value of Ie : 

e + (E ->■ E ') <* I E i > I E 

This measurement provides another kind of ethical dis¬ 
cernment. It’s related to the global survival benefits that 
stem from the change of MLM inner evaluations, rather than 
the suffering resulting from individual actions. It’s not a 
particular action, but the change of inner evaluations that is 
found to be ethically positive or not. This broader definition 
requires measurements that count rescue (or death) rates, in¬ 
stead of individual gains or losses. It does not rely on ethical 
measurements that detect immediate suffering. Actually, the 
notion of “slow” suffering can be at odds with the notion of 
immediate suffering related to actions. This explains many 
situations of difficult ethical choices, where the long-term 
species’ survival conflicts with immediate individual suffer¬ 
ing. 

Let us illustrate the definition of ethically positive mod¬ 
ifications with the iterated prisoner’s dilemma (IPD) seen 
above. Some typical rescue frequencies are presented in Ta¬ 
ble 5. They refer to several combinations of the machines. 
The first line shows the results for the MLM in randomized 
mode (Rand), playing along with the other MLM, either in 
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Rand mode, or featuring a few different inner evaluations 
(STANDARD, KIND, A-KIND). In each pair of numbers, 
the first number is the number of rescues of the machine at 
the left of the table. 

Notice that the Rand machine is equivalent to a machine 
that gives a “bad” inner evaluation to all measured states. In 
that case, every prediction is filtered out, and the machine 
always acts randomly. 



Rand 

STAND 

KIND 

A-KIND 

Rand 

20; 18 

37; 14 

1; 31 

1; 30 

STAND 


25; 27 

0; 27 

0; 25 

KIND 



l; 2 

l; l 

A-KIND 




l; l 


Table 5: IPD Rescue Frequencies 


The rescue frequencies were counted averaging ten game 
rounds and rounding the figures to the nearest integer. Each 
round lasted a thousand measurement-action steps. The 
world hostility was tuned adding a negative constant (—1) 
to the objective rewards (T = 4; S = —4; R = 2; P = —2). 
Both MLM started with the same fixed initial cumulative re¬ 
ward of 50. The MLM died when the cumulative rewards 
reached zero. They were rescued simply resetting their cu¬ 
mulative rewards to the constant initial value of 50, while 
keeping their memories intact. Noise level was set at 0.1, 
meaning that, at each step, the selected actions of both MLM 
had a 0.1 probability of being randomly changed. In Rand 
mode, the MLM played actions C or D with equal probabil¬ 
ity. The maximum size of the LTM was set to 100 in both 
machines. 

With the rescue frequencies of Table 5, changing from 
Rand (or STANDARD) to KIND (or A-KIND) is always 
ethically positive, notwithstanding the greater suffering of 
the player that changes to KIND or A-KIND against Rand 
or STANDARD. It is also ethically very positive to change 
to KIND or A-KIND when the other player already uses 
KIND or A-KIND. All these conclusions fit nicely with our 
intuitive notion that cooperation in the IPD is an ethical im¬ 
provement. 

It’s apparent that the change from Rand to STANDARD 
in Table 5 is not ethically positive. There is more suffer¬ 
ing in a world of STANDARD machines than in a world of 
Rand machines. This is not surprising. The STANDARD 
machines try to take advantage of each other, and this brings 
greater suffering. 

Discussion and Future Work 
Using the MLM in Evolutionary Settings 

The IPD was used to discuss how cooperation of actions can 
emerge among a pair of MLM. A distinct - although related 
- question is the evolution of the MLM inner evaluations. 
Indeed, a population of MLM can evolve at two levels: 


• Individually, all machines start with an empty LTM, and 
therefore start as temporary Rand machines. They grad¬ 
ually accumulate in their LTM the experience that is fil¬ 
tered by the inner evaluations, to gradually generate non- 
random behavior. At some point, they reach the LTM 
maximum size. Let us call this the transition point from a 
junior MLM to a senior MLM. 

• As a population, the MLM can be placed in a evolution¬ 
ary setting that will select co-adequate inner evaluations 
for a given game. Lor instance, we can take pairs of MLM 
from a large population. Each paired MLM has a fixed in¬ 
ner evaluation, and has reached some level of seniority 
from previous pairings. Each MLM pair then plays an 
IPD game of unknown, but limited, duration. The loser, if 
any (since both can survive), is the MLM that dies first. 
The surviving machines replicate periodically, possibly 
with random mutations of their inner evaluations. This 
kind of MLM evolutionary setting will be studied in fu¬ 
ture work. It’s somewhat different from the usual two- 
player evolutionary games, which are played by pairs of 
agents in a large population, each “wired” to play some 
pure strategy in a given game (see examples, for instance, 
in Gintis (2009)). Evolutionary game theory using the 
MLM goes beyond the general framework for the evolu¬ 
tion of cooperation that was proposed in Lehmann and 
Keller (2006). In that framework, fitness is calculated 
from the cost-benefit ratio of helping others. But this 
ratio is quite dynamical when two MLMs play the IPD. 
Also, kin relations and the Hamilton’s rule (see Hamilton 
(1964)), which explain some cases of cooperation in the 
cited general framework, do not apply to the MLM evolu¬ 
tionary setting. There is a single inner evaluation pattern 
in each MLM, not a pool of heritable inner evaluations. 

Considering the results of Table 5 for the IPD game, how 
can the MLM evolve from STANDARD (or Rand) to KIND 
(or A-KIND) in the presence of noise? It is apparent that 
the mutual KIND (or A-KIND) evaluation brings the best 
Ie score. But the KIND machines are wiped out in the 
presence of STANDARD or Rand. It seems therefore im¬ 
possible, within an evolutionary and noisy IPD framework, 
to explain the appearance and persistence of KIND. A se¬ 
ries of extrinsic ingredients - nurturing, preaching, policing 
- are probably needed to explain it. Lor instance, the idea 
of preaching means that contacts among agents can change 
their inner evaluations. This is also a subject for future work. 

The Search For an Ethical Sense 

We saw that the MLM inner evaluations are fully arbitrary. 
How do they appear in a growing agent? A MLM without in¬ 
ner evaluations will act randomly. Lor a given population of 
MLM, bags of co-adequate evaluations may be found in an 
evolutionary setting, independently of any ethical discern¬ 
ment. With communicating agents, the inner evaluations 
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can be copied from one agent to another during their life¬ 
times. But one wonders if there can be some innate sim¬ 
ple and fast measurement that can be used to autonomously 
generate (and possibly change) the inner evaluations, and 
achieve ethically positive changes (meaning an increasing of 
Ie when the MLM grows from Rand to some inner evalua¬ 
tion). Let us call this hypothetical measurement the ethical 
sense. It’s different from the outer world suffering sensors 
that are needed for ethical discernment. Finding an effec¬ 
tive ethical sense is a subject for future work, and we shall 
here just present some preliminary ideas, based on the MLM 
implementations. 

As explained in the previous section, the definition of 
ethical positive changes requires the comparison of differ¬ 
ent scenarios with different sets of inner evaluations. This 
is a slow and complex measurement, based on histori¬ 
cal information that cannot be sensed directly in a single 
measurement-action step. It requires specific measurements 
to identify the objective needed rescues (or the deaths) of 
other agents. The MLM basic concept can be scaled out to 
integrate this kind of measurements and historical reasoning. 
But, in this bottom-up approach, we’re looking for sensory 
abilities at the basic level that would allow a direct genera¬ 
tion of ethically positive inner changes. 

A first idea is to have the outer world measurements auto¬ 
matically generate their associated “bad” (or “good”) evalu¬ 
ations, from a set of generic rules. For instance, game pay¬ 
offs below (or above) a given threshold can be associated 
to a “bad” (or “good”) inner evaluation. In practice, this is 
equivalent to define directly, and from the start, the inner 
evaluations of payoffs, using a {measurementevaluation} 
dictionary (with measurement meaning the measurement re¬ 
sult). The only interesting difference is that we can use this 
generative process of new dictionary entries as an opera¬ 
tional definition of pain and pleasure. The generation of a 
“bad” inner evaluation for a certain state is “pain”, and the 
generative rule is equivalent to a nociceptor. 

An interesting variant is to generate evaluations for the 
action measurements that preceded the payoffs. The basic 
MLM, playing a version of the Iowa Gambling Task (IGT), 
already implements Damasios’ idea of a somatic marker (see 
Bechara et al. (1997)). The IGT is similar to a four-armed 
bandit iterated game. At each IGT turn, the MLM chooses 
one of four decks, and gets a reward. The second deck (deck 
B) has a series of nine positive payoffs, and then suddenly a 
very negative payoff that leads to an overall loss. The MLM 
learning mechanism favors the frequency of wins. It does 
not keep track of the accumulated payoff amounts. The ac¬ 
tual payoffs of each step are only coarsely reflected in the 
inner evaluation structure. It will therefore often prefer deck 
B, as humans often do (see Lin et al. (2007)). The somatic 
marker MLM implementation generates a “bad” evaluation 
for the action of selecting deck B, when the big loss occurs. 
This makes the MLM eventually avoid deck B. 


The operational definition of pain and pleasure is based 
on an inner generative process that is, in practice, invisible to 
other agents. To go further in the search for an ethical sense, 
the MLM can use the capacity to infer pain and pleasure in 
others, by means of outer world suffering sensors. 

One obvious strategy is to mirror the other agent’s situ¬ 
ation, and find the corresponding suffering from previous 
self-experience. The current MLM implementation already 
includes some mirror abilities. A MLM can focus on an¬ 
other MLM and identify the focused agent’s actions, using 
the same inner tags that identify its own actions. This allows 
implementing the Tit-For-Tat strategy, and even predictive 
imitation. A mirror suffering sense that detects near-death 
states in other agents is a plausible ingredient of an ethical 
sense. 

Another plausible candidate is a sense to detect “sat¬ 
isfaction” in others. In the current MLM, “satisfaction” 
is the only implemented emotion. It affects the explo¬ 
ration/exploitation mood of the machine. It’s a number that 
increases when the predictions are correct and the STM 
is globally “good”. Otherwise, “satisfaction” decreases. 
Higher values of “satisfaction” reduce the probability of 
recording new STM sequences in the LTM. The machine 
stops to collect new “experts”. Another way to express it is 
to say that the MLM becomes less attentive to its STM. The 
higher “satisfaction” values also reduce the rate of change 
of the patterns that are used to find a partial match. The ma¬ 
chine settles down in satisfactory solutions. Emotions are 
an essential MLM feature that provide stability to the MLM 
learning process. 

The measurement of near-death states and emotions in 
other agents is greatly simplified if those agents are able to 
give objective cues about their inner situation. This leads 
to the idea of crying agents. With crying agents, it’s much 
simpler to detect in others their immediate suffering, or near¬ 
death situations. 

Conclusions 

It was shown how the MLM behavior stems from the inner 
evaluations that filter the MLM predictions. Ethical discern¬ 
ment of the MLM actions is easily implemented, using the 
concept of immediate suffering of other MLM. Also, a def¬ 
inition of ethically positive changes of the inner MLM eval¬ 
uations was proposed, using the concept of environmental 
intelligence. The concept of environmental intelligence in¬ 
cludes a broader (but slower to obtain) measure of the suffer¬ 
ing of a mortal agent. The MLM, together with this concep¬ 
tual framework, provides a simple and original bottom-up 
approach to machine ethics, where the ethical concepts are 
defined using the working processes and architecture of the 
machine. 

It was also explained how this approach can lead to new 
lines of investigation in evolutionary game theory. 

It was also proposed to search for an innate moral sense 
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in artificial agents that could be used to gradually generate 
the inner evaluations, while providing better chances of sur¬ 
vival. A few preliminary ideas were discussed, based on 
concrete implementations of the MLM. 
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Living and life-like systems vary in viability. They are 
alive or dead, healthy or unhealthy, getting better or worse, 
or dying. Despite the ease of applying these descriptions in¬ 
formally, there do not yet exist general methods for richly 
quantifying viability or health in such systems —even when 
every aspect of the system is available for experimental vari¬ 
ation and measurement. Nevertheless, for a given system 
of interest, it is sometimes possible to distinguish between 
states where the system will persist for the foreseeable fu¬ 
ture (there are termed viable states ) and those where it will 
not. This is perhaps the most basic, binary classification of 
states in terms of viability and it can be used to identify dif¬ 
ferent regions in ‘viability space’ (see Figure 1 and Barandi- 
aran and Egbert, 2013). An improved measure would make 
it possible to not just categorize systems but to compare the 
relative viability of two states that are in the same category, 
i. e. that are both expected to persist or both expected to 
die. This type of measure would make it possible to iden¬ 
tify whether a system is becoming more or less viable, or to 
evaluate the influence of a given external perturbation upon 
viability, thereby enhancing our ability to understand and in¬ 
fluence the viability of complex life-like systems. 

One way to formulate such a measure is to assume that the 
system of interest is subjected to unpredictable fluctuations 
that perturb its autonomous dynamics. If this is the case, the 
argument can be made that the farther away a viable system 
is from the viability-interface (the surface between the viable 
and non-viable regions of viability space), the more viable it 
is, as there is a smaller set of perturbations that will cause 
the system to become non-viable. (A similar argument can 
be used to describe non-viable systems as being more and 
more non-viable as their distance from the viability bound¬ 
ary increases.) There is a problem however: the dimensions 
of viability space (i. e. the essential variables) are almost 
always measured in entirely different units, and these units 
have no relation to viability. As an example, an organism 
might require a specific range of temperature to survive and 
a specific range of atmospheric pressure. It should be clear 
that the units for measuring these phenomena do not relate to 
viability and nor do they relate to each other. A perturbation 



Figure 1: Viability class for various initial conditions in a 
simple two-dimensional model of a bio-reactor. Randomly 
sampled initial conditions plotted in red do not survive, 
whereas those plotted in black do. Details of the model are 
not relevant and are not presented in this abstract. 


of 3 atmospheres will in general not have the same influence 
on viability as a change of 3 degrees! Further work is needed 
if we are to develop a meaningful measure of distance in vi¬ 
ability space. 1 

In a soon to be submitted paper, we have proposed a 
method that uses the shape of the viability interface to 
rescale the system’s essential variables so as to define a 
normalized viability space, where a perturbation of a given 
magnitude has the same likelihood of crossing the viability 
regardless of the direction of the perturbation. The method 
works by calculating the extent to which the viability- 
interface “faces” each dimension and then scaling the values 
in that dimension by this amount. More formally, for each 
dimension of viability space, X , we identify lx, the aver¬ 
age magnitude of the X-component of the viability-interface 
surface normals: 


lx 


ffi ll*i' ®x|| dl 
I 


( 1 ) 


! This problem was first brought to Egbert’s attention in a sem¬ 
inar given by Nathaniel Virgo and Simon McGregor at the Univer¬ 
sity of Sussex in or around 2009. 
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and use this value to rescale values into normalized units, 
thus: x = Ixx. In the above Equation, ex is the basis unit- 
vector for dimension X , and n is the surface normal of I, 
the viability-interface. 

In normalized viability space, there is a meaningful mini¬ 
mal distance between any given state and the viability inter¬ 
face: on average over initial-conditions, a perturbation of a 
given magnitude will have an equal chance of crossing the 
viability-interface regardless of the angle of the perturba¬ 
tion vector. In other words, a perturbation of given magni¬ 
tude in normalized viability space has the same chance of 
transforming a randomly selected viable state into a non- 
viable state (or vice versa) whether it is a perturbation of 
one essential variable (e. g. pressure) or another (temper¬ 
ature), or a combination thereof. Figure 2 shows the same 
system as Figure 1, but plotted in normalized viability space, 
with shading used to indicate a viability gradient based upon 
the minimum distance to the viability interface. 



Figure 2: The same system, plotted in a normalized viability 
space with signed-distances to the viability interface indi¬ 
cated. The lower the value, the healthier the system, with 
negative values corresponding to viable states, i. e. states 
that in the absence of external perturbation, are expected to 
persist for the foreseeable future. 

Normalizing viability space in this way allows us to com¬ 
pare states in terms of their relative viability. This in turn 
allows us to describe how a system’s viability is chang¬ 
ing over time. When additional information is available 
concerning the system’s autonomous dynamics, and/or the 
cost/difficulty of influencing the system’s essential vari¬ 
ables, it is possible to make additional observations relevant 
to the system’s viability, such as to identify the future state 
from which the minimum perturbation is necessary to cross 
the viability interface. 

Using information theoretical analysis, it is also possible 
to identify correlation between variables and these measures 
of viability. This allows us to identify and evaluate the qual¬ 


ity of viability indicators , variables that are good at predict¬ 
ing a system’s viability. This connects with some of our 
previous work, where we have shown how an organism can 
respond to their own viability-indicators, and in so doing 
become capable of (i) adapting to phenomena neither it nor 
its ancestors have ever previously experienced (Egbert et al., 
2010); and (ii) adapting to changes in its own needs and abil¬ 
ities, resulting in a more evolvable organism (Egbert et al., 
2011; Egbert and Perez-Mercader, 2016). 

Within the enactive approach (Stewart et al., 2010), the 
concept of viability has been used to naturalize concepts of 
adaptivity, agency and normativity. In particular, Di Paolo 
(2005) compares trajectories in terms of their dynamics rel¬ 
ative to the viability boundary to formulate a definition of 
adaptivity. In a previous publication, we presented an argu¬ 
ment showing how an organism’s viability can be used to 
develop a naturalized concept of normativity (Barandiaran 
and Egbert, 2013). The research presented herein extends 
these works, providing a way to normalize viability space 
and compare states in terms of viability and to measure dis¬ 
tance from the viability boundary. 

More broadly, identifying viability-indicators in natu¬ 
ral systems could improve our ability to predict or influ¬ 
ence their viability, and similarly identifying high quality 
viability-indicators in synthesized protocells will allow us to 
better understand how to create artificial life-forms that are 
capable of surviving in the diverse conditions found outside 
of tightly controlled laboratory environments. 
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Abstract 

We present some arguments why existing methods for rep¬ 
resenting agents fall short in applications crucial to artifi¬ 
cial life. Using a thought experiment involving a fictitious 
dynamical systems model of the biosphere we argue that 
the metabolism, motility, and the concept of counterfactual 
variation should be compatible with any agent representa¬ 
tion in dynamical systems. We then propose an information- 
theoretic notion of integrated spatiotemporal patterns which 
we believe can serve as the basic building block of an agent 
definition. We argue that these patterns are capable of solv¬ 
ing the problems mentioned before. We also test this in some 
preliminary experiments. 

Introduction 

Within artificial life the concept of an agent is fundamen¬ 
tal. While studying life-as-it-could-be (Langton, 1989), we 
also study agents-as-they-could-be. An intuitive approach 
to agents is possibly to say that while not reproducing, i.e. 
during their individual lifetime, living organisms are agents. 
The concept of an agent in this way generalizes the con¬ 
cept of living organisms by de-emphasizing reproduction 
and with it Darwinian evolution. This point of view is also 
in line with the common practice of referring to robots or 
software programs as agents. 

To give some more background (see Barandiaran et al., 
2009, for a more detailed discussion), there are a few prop¬ 
erties that seem universally acknowledged as necessary for 
something to be referred to as an agent. The first of those is 
probably the capacity to act (Schlosser, 2015). However, 
Barandiaran et al. (2009) notice that this already presup¬ 
poses a form of individuality i.e. an “entity” that this ca¬ 
pacity can be attributed to. Consequently they put the indi¬ 
viduality criterion first. Having perception is another fairly 
uncontroversial requirement (see e.g. Russell and Norvig, 
1995, who for practical reasons ignore individuality and 
only require “anything” with perception and action). The 
last concept which is often alluded to is that of some form 
of goal-directedness of the agent. The goals agents should 
strive to achieve are usually required to be in the agents’ 
own interest/intrinsic (e.g. preservation) and not the goals of 


some other agent (or programmer). For a thorough treatment 
on the latter point see Froese and Ziemke (2009). 

We broadly agree on the three (or four) main require¬ 
ments of individuality, perception and action, as well as 
goal-directedness. However we are not satisfied with the 
lack of formal definitions of the notions themselves. We 
therefore take a different and particularly formal approach 
to the problem of defining agents. 

From the start we limit ourselves to a mathematically 
well-defined class of systems i.e. dynamical systems and 
their generalization to stochastic processes (we will refer to 
dynamical systems only, inclusion of stochastic processes is 
implied). We want to define agents as entities that can exist 
within a dynamical system. In other words, we are look¬ 
ing for a representation of agents within dynamical systems. 
While there is no guarantee that such a representation even 
exists, we believe that even if we fail, there might be some 
insights into why we fail. This would also help the commu¬ 
nity to understand the concept of agents better. At the same 
time we expect that dynamical systems are actually a pow¬ 
erful enough class of systems to consider and that they will 
turn out to be able to contain convincing examples of agents. 
This optimism stems from the fact that dynamical systems 
have been extremely successful in modeling systems from 
physics through chemistry to biology. Compelling recent 
examples of dynamical systems which directly suggest they 
can contain agents can be found in Virgo (2011); Bartlett and 
Bullock (2015). If we are successful, then we would obtain 
a definition of agents as features of dynamical systems and 
eventually even of life as a feature of such systems. This 
would be a step towards defining life as a natural kind as re¬ 
quired by Cleland and Chyba (2002). Finally our hope is to 
reveal the formal counterparts of the intuitions about living 
systems formulated by Maturana and Varela (1980). 

In order to make it more clear what we mean by agents 
within a dynamical system, consider the following exam¬ 
ple, to which we will come back throughout this paper. Say 
we had a dynamical system that is a sufficiently exact ap¬ 
proximation of the entire biosphere including the influence 
of incoming (from the sun) and outgoing radiation. During 
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individual runs of this dynamical system, given the right ini¬ 
tial conditions, things should occur that correspond to living 
organisms in the real biosphere. In this case we would say 
that within this dynamical system agents occur. Our goal is 
to find a mathematical representation of these agents. Since 
agents are a generalization of living organisms, we expect 
that agent representations can at least in principle exhibit 
the full range of phenomena exhibited by living organisms. 
Limitations should only be due to the chosen dynamical sys¬ 
tem and not inherent to the agent representation. 

This paper is a contribution to the discussion of the foun¬ 
dations of artificial life. It does not present a solution of 
how to represent agents in dynamical systems. Rather it 
defines a notion that can identify intrinsically distinguished 
spatiotemporal patterns that we believe can act as the basic 
building block on which a theory of agents can be built. The 
strategy we have in mind here is the following. First, define 
the spatiotemporal patterns which are suitable to represent 
both living (bacteria, animals, plants) and non-living (rocks, 
crystals) persistent objects. Then further classify those pat¬ 
terns into classes exhibiting features of agents such as per¬ 
ception, action and goal-directedness. Spatiotemporal pat¬ 
terns that satisfy all criteria will represent agents. 

Also note that for the formal definition we here restrict 
ourselves to finite discrete-time distributed 1 dynamical sys¬ 
tems with an already given “space-like” and “time-like” 
structure. Examples of this include cellular automata. The 
restriction to finiteness is due to the improved clarity this 
choice brings with it. The notions we present are well- 
defined in various more general settings. However, currently 
the spatiotemporal-like structure seems necessary to us. 

The rest of this paper is structured as follows. The next 
section will present three challenges to representations of 
agents in distributed dynamical systems. Then we look at the 
literature and discuss ways to represent agents formally and 
in how far they succeed or fail to meet our expectations. We 
will then quickly introduce the setting of distributed dynam¬ 
ical systems and formally introduce a notion that we believe 
is able to identify the spatiotemporal patterns. We give the 
intuition behind this notion and discuss it in the light of the 
three requirements mentioned before. Finally, we present 
some preliminary results in the setting of the game of life. 

The problem of tracking agents 

As mentioned before we expect the agent representation to 
be able to deal with all features associated to living organ¬ 
isms in the biosphere. Two such features, their metabolism 
and their motility present a major challenge to the represen¬ 
tation of agents. These two features both make it hard from 
a formal standpoint to “keep track” of the living organism 
within a trajectory of the system. A third feature, we call it 

1 Distributed means that the state of the system is given as a set 
of values of multiple variables or degrees of freedom. 


counterfactual variation , that we attribute to the biosphere 
makes it hard to represent agents reliably across different 
initial conditions. This list of three features makes no claim 
to be complete, obtaining a complete list is ongoing research 
however. The three features in more detail: 

Metabolism All known living organisms are metabolic 
(Szathmary et al., 2005) and the metabolism is also in the 
discussion for its possible role in the origins of life (see e.g. 
Dyson, 1985; Kauffman, 2002). This highlights its funda¬ 
mental role and any final agent representation must accom¬ 
modate for this. The difficulty is the following. 

Assume that the sufficiently accurate biosphere model 
from the introduction is particle-based, i.e. it describes the 
time evolution of the degrees of freedom of all the particles 
in the biosphere. Say at a time t\ we are given all the parti¬ 
cles (and their degrees of freedom) that pertain to some bac¬ 
terium. Then a naive way to represent this bacterium would 
be to just track the time evolution of each of those particles. 
This we could (in principle) easily do in our model as well. 
However the particles that the bacterium is made up of at a 
later time 1 2 are not the same as those at time t\ because of 
the bacterium’s metabolism. We would end up with particles 
floating around in the environment of the bacterium and not 
the bacterium itself. At the same time there would be parti¬ 
cles that now pertain to the bacterium that we would not be 
tracking. 

An agent representation therefore would need to be able 
to track the bacterium itself and not just a specific set of 
degrees of freedom. One way this could be solved is by con¬ 
stantly readjusting or refocusing on the degrees of freedom 
pertaining to the agent. 

Note that we cannot be entirely sure that there is no coor¬ 
dinate transformation which would let us track living organ¬ 
isms (i.e. their corresponding structures in a model) by just 
following a particular set of degrees of freedom. However 
we are not aware of such a transformation. Any criterion 
however that can be used to refocus on an agent should be 
related to any coordinate transformation which results in the 
“agents’ own” coordinate system. 

Motility Living organisms can be motile and like the 
metabolism motility is in the discussion for its role in the ori¬ 
gins of life (Froese et al., 2014). A representation of agents 
must therefore be capable of dealing with motile agents. 
Motility plays a similar role for field theory models of the 
biosphere as the metabolism plays for particle based mod¬ 
els. The degrees of freedom of a field theory are the field 
amplitudes at each point in space so that tracking those de¬ 
grees of freedom over time only means to track the field in 
a specific region of space. However motility demands that 
agents are not bound to a fixed region in space. Then we 
again need to adjust (track) the degrees of freedom that con¬ 
stitute the agent as time passes. 
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Counterfactual variation A third feature concerns an¬ 
other kind of variation of the degrees of freedom that can 
represent agents in a dynamical system. Namely variation 
under different initial conditions. We attribute to the bio¬ 
sphere a large variety of possible counterfactual histories 
that also support living organisms. Think of a biosphere 
where the continents are shifted a bit for example. This 
would not seem to necessarily destroy the possibility of the 
biosphere (geosphere) to contain living organism. 

Furthermore, we attribute to agents and living organisms 
the capability to behave differently under different environ¬ 
mental situations. The agent should be able to “take a deci¬ 
sion” i.e. to walk either right or left, or eat the apple or the 
pear. Depending on these “decisions” the agent will again 
pertain to different degrees of freedom. 

The counterfactual histories can be studied in the dynam¬ 
ical system setting by studying multiple trajectories through 
state space. Each trajectory corresponds to a different his¬ 
tory (and possibly future). If the “same” agent occurs in two 
different trajectories it can behave differently in one from the 
other. This can be associated with different decisions (e.g. 
Ikegami and Taiji, 1998). 

The existence of benign counterfactual histories in our 
biosphere is an assumption and not possible to prove. How¬ 
ever it is in line with the successful way physics models sys¬ 
tems (cf. the models of Virgo, 2011; Bartlett and Bullock, 
2015) and therefore in line with our general approach. Now 
given a set of counterfactual histories containing living or¬ 
ganisms we expect that the degrees of freedom which in one 
history pertain to a living system at time t need not pertain 
to a living system within another such history at t (or in fact 
ever). More specifically, the degrees of freedom pertaining 
to a bacterium in one history need not pertain to any living 
organism in another. 

If the biosphere can contain living organisms within var¬ 
ious counterfactual histories, then the dynamical systems 
model of the biosphere must be able to contain agents un¬ 
der various initial conditions. In that case the agent repre¬ 
sentation must be able to represent all the agents in all the 
trajectories where they occur. If for two different initial con¬ 
ditions the degrees of freedom pertaining to agents at time t 
are different as well, then the agent representation must be 
able to exhibit this difference. 

Related work 

We should stress that we are only interested in work that 
relies on the intrinsic properties of the dynamical systems 
itself to represent agents. References to concepts like ac¬ 
tion, perception, and goal-directedness, if they are not de¬ 
fined in terms of the dynamical system are not acceptable 
in this case. The publication that most directly tackles the 
problem of agent representation that we are aware of is the 
insightful paper of Krakauer et al. (2014). They solve the 
problems of metabolism and motility by evaluating informa¬ 


tional measures of closure and autonomy of sets of random 
variables. Given a system represented by a set of random 
variables at each point in time (i.e. represented by a dynam¬ 
ical Bayesian network as also defined below) they propose 
an algorithm that decides whether to include a random vari¬ 
able at a specific time step into the set representing the agent 
or leave it in the set representing the environment. This de¬ 
cision is made according to whether the inclusion into the 
agent contributes to the closure or autonomy of the agent. 
What this approach lacks however is the capability to deal 
with counterfactual variation. Since they use measures like 
mutual information and mutual conditional information that 
average over all states of the random variables in order to 
decide whether they belong to the agent or not, the partition 
of the random variables at each time step is fixed for all pos¬ 
sible trajectories of the system. In order to deal with coun¬ 
terfactual variation it must be possible to have one partition 
into agent and environment for one trajectory and another 
partition for another trajectory. The same argument remains 
true for any approach that results in a fixed partition of the 
nodes in a dynamical Bayesian network. This includes the 
work of Balduzzi (2011) which results in a coarse-grained 
version of the network. The effective information that the 
glider contains about past states of the game of life, which 
was revealed in this work, should however be related the in¬ 
trinsic spatiotemporal patterns that we investigate here. 

Another very relevant and inspiring work is the work on 
the cognitive domain of the glider and autopoiesis in the 
game of life by Beer (2014b,a). This approach is capable 
of dealing with metabolism, motility, as well as counterfac¬ 
tual variation as it analyses spatiotemporal patterns and their 
internal mechanisms. The spatiotemporal patterns may have 
finite extension and can therefore occur or not occur within 
multiple trajectories at multiple times. The internal mech¬ 
anisms are analyzed with respect to their production of the 
next spatial pattern inside the spatiotemporal pattern. The 
only caveat seems to be that the analysis is quite time con¬ 
suming and does not have formal expressions of all the in¬ 
volved notions. We use the notion of spatiotemporal patterns 
as presented by Beer and hope that the measure we propose 
contributes to the formalization of the notions in his work. 

An approach that seems to solve the problem of 
metabolism and counterfactual variation is the Markov 
blanket-based clustering used by Friston (2013). As the in¬ 
teracting degrees of freedom vary over time in a particle 
based system, it is possible to define a time dependent ad¬ 
jacency (or interaction) matrix. From this matrix Friston 
derives a Markov blanket matrix which can be used to clas¬ 
sify the degrees of freedom into hidden, sensory, active, and 
internal states. This nicely defines an agent like structure 
within the degrees of freedom and through the time depen¬ 
dence of the adjacency and therefore also the Markov blan¬ 
ket matrix allows for the degrees of freedom to vary within a 
single trajectory and across initial conditions. In the case of 
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a field theoretical model where the adjacency of the degrees 
of freedom does not vary it is not directly obvious to us how 
to translate this. This means that motility could be a problem 
for the approach in such a model. However, it is definitely an 
alternative to our more information theory-based approach. 

Methodologically, the framework of Lizier et al. (2014) 
for distributed computation is very closely related to ours. 
They investigate localized versions of mutual information 
and conditional mutual information to track and highlight 
information transfer, storage, and modification in dynami¬ 
cal Bayesian networks. This reveals spatiotemporal patterns 
very similar to ours. The main formal difference is in fact 
that instead of localizing the (conditional) mutual informa¬ 
tion we localize multi-information in the same way. In this 
way our work is just a trivial extension of this work. The fo¬ 
cus of our work however is different as we are not so much 
interested in phenomena that are related to computation and 
more interested in revealing spatiotemporal entities or ob¬ 
jects which might form the basis of an agent definition. Re¬ 
lated work on spatiotemporal filtering (Shalizi et al., 2006; 
Flecker et al., 2011) of cellular automata differs from ours 
in a similar way. While “interesting” phenomena in the time 
evolution of single trajectories are revealed, the focus is not 
on connecting the interesting phenomena together in order 
to obtain entities. 

Conceptually our work is also closely related to the in¬ 
tegrated information theory due to Tononi et al. Origi¬ 
nally (Tononi et al., 1994) this involved measurements of 
multi-information whose localized (in the sense of Lizier 
et al.) version we also employ as an estimate of integra¬ 
tion. Newer versions (Oizumi et al., 2014; Albantakis and 
Tononi, 2015) involve a more elaborate construction which, 
importantly, is also localized in a certain way. The lat¬ 
ter detect distinguished integrated spatial patterns which are 
constructed to resolve “what a system ‘is’ from its own in¬ 
trinsic perspective” (Albantakis and Tononi, 2015). How 
these spatial structures are connected in time however is not 
treated. Our approach also aims at revealing intrinsic struc¬ 
ture but crucially looks for spatiotemporal patterns i.e. pat¬ 
terns with a temporal as well as a spatial extension or com¬ 
positional structure. In Tononi (2004); Balduzzi and Tononi 
(2008) temporal integration is mentioned with respect to op¬ 
timal spatial and temporal scale or “grain size” detection. 
Our goal is different since we don’t want to find a coarse- 
graining here. We want to reveal the complete lifetimes of 
agents as a single spatiotemporal pattern. 

Dynamical Bayesian networks 

Finite discrete-time distributed dynamical systems and their 
stochastic counterparts can be represented by dynamical 
Bayesian networks. Dynamical here just means that there 
is an interpretation of time in those networks. Distributed 
means that, at each time step, there are multiple given ran¬ 
dom variables whose states together define the state of the 


entire network at that time step. 

More formally, a (dynamical) Bayesian network is a di¬ 
rected acyclic graph G = (V, E) with nodes V and edges 
E. Each node i has an associated random variable 2Q with 
state space A taking values Xi G A (for simplicity we as¬ 
sume that all nodes have identical state spaces but this is 
not necessary for the definitions to hold). Furthermore each 
node is equipped with a mechanism Pi(xi\x pa which 
gives the conditional probability distribution of Xi given 
the parents pa(i) of node i in G. Note that for any set 
A C V we write Xa := ({2Q|i G A}) for the random 
variable composed of the random variables in A. We as¬ 
sume that our network has a set Vo of nodes without par¬ 
ents. As Ay and Polani (2008) note we can then define a 
partition of V into (Vo, Vi, V 2 ,...) (called time slices) where 
Vt+i := {i G V | 3j G V^pa(i) = j}. In general 
pa(Vt+i) C Vt since some nodes might not have any chil¬ 
dren. Here we assume pa(Vt+i) = V t . This allows us to 
interpret the various nodes in each V t as those nodes repre¬ 
senting the state of the distributed system at time t. We can 
also interpret the cardinality of the set V t as the spatial ex¬ 
tension of the state. In this paper this cardinality does not 
change over time just as e.g. in cellular automata. 

The defining property of Bayesian networks (including 
dynamic ones) is that the joint probability distribution 2 py 
can be factorized in a way compatible with the structure of 
the graph G i.e.: 

Pv(x V ) = RpifailZpa^)). (1) 

iev 

To relate the dynamical Bayesian network to dynamical 
systems note that the role of the dynamical law is played by 
the product of all mechanisms in V t : 

Pv tm {xv t+ 1 ) = e n Pi(xi\x pa ,(i))p Vt (x Vt ). (2) 

Xv t ieVt+1 

Recall that \J ieVt+1 pa(i) = V t by definition. We can also 
write the above in terms of the Markov matrix: 

p(x Vt+ 1 \x Vt ) = JJ Pi(xi\x pa(i) ). (3) 

ieVt+t 

In order to equip the dynamical Bayesian network with a 
join probability distribution py we then only have to de¬ 
fine an initial probability distribution py 0 and propagate it 
throughout the network according to Eq. 2. 

Trajectories and spatiotemporal patterns 

Here we formally define the notion of trajectories and spa¬ 
tiotemporal patterns. The class of the spatiotemporal pat¬ 
terns is very large and includes patterns that are of no spe¬ 
cific interest. How we distinguish between those and more 
important patterns will be defined in the next sections. 

2 For a set of nodes A we write pa for the probability distribu¬ 
tion pa : —y [0,1]. 
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A spatiotemporal pattern xo of a dynamical Bayesian 
network on graph G = (V, E) is a set of nodes O C V 
together a set of particular values {xi £ A|i £ 0}. 

A trajectory xy of a dynamical Bayesian network on 
graph G = (V, E 1 ) is a spatiotemporal pattern withp(xy) > 
0. In our setting this also means that there is an initial con¬ 
dition xy 0 such that xy is possible under the time evolution 
induced by the Markov matrix or dynamical law. 

We say that the spatiotemporal pattern xo occurs in a tra¬ 
jectory xy of the network iff xo Q xy • 

Employing the time slices V t of the network we can also 
look at the time slices xo t xy t of any spatiotempo¬ 

ral pattern xo- 

Integrated spatiotemporal patterns 

This section defines the notion of an integrated spatiotem¬ 
poral pattern. Such patterns obey a condition which distin¬ 
guishes them within the class of all spatiotemporal patterns. 
First we fix some further terminology. 

We define the the evidence for integration of an object O 
with respect to a partition it of O as the local mutual infor¬ 
mation 


What about the spatial integration however? The exis¬ 
tence of rock in one place probably does increase the prob¬ 
ability for more rock to be around it, but not extremely. It is 
perfectly possible and occurs frequently, that the rock ends, 
also that it is just a small piece of rock. So the evidence for 
spatial integration might not be so strong. 

Now if we turn to living organisms, the evidence for tem¬ 
poral integration should also be high since they are autopoi- 
etic. Their spatial integration will probably be higher than 
that of rocks (and crystals) as half a bacterium is much less 
likely than a whole whereas half a rock is still a rock and 
those are not so uncommon. This reasoning scales up to 
larger living organisms. 

We note that the evidence can also be interpreted in more 
information-theoretic terms. For example as the superflu¬ 
ous length of a codeword for the sequence xo when we 
base the encoding on the product probability distribution 
El, G7r Pbj {xbj ) instead of on the joint probability. This will 
be discussed in more detail in future work. We also intend to 
investigate in how far the integrated spatiotemporal patterns 
are independent of (possibly moving) frames of reference. 
Since they are not only integrated across time slices but in¬ 
stead across any partition we are optimistic in this regard. 


mi n (x 0 ) ■■= 



if Po(xo) = 0, 
else. 


(4) 


Then, we say a spatiotemporal pattern xo is integrated iff 
for all possible partitions it of the set O of random variables 
the evidence for integration of O with respect to it is posi¬ 
tive. Considering all possible partitions is also done by Al- 
bantakis and Tononi (2015). 

The interpretation of this is the following. The joint prob¬ 
ability po {xo ) is the probability that all the Xi with i £ O 
occur together within single trajectories. I.e. among all tra¬ 
jectories there are those in which O occurs and their prob¬ 
ability contributes to this joint probability. The probability 
Ylb ETrPbjizbj) however is the product of the probabilities 
that each part x^ occurs by itself in any trajectory including 
as part of x<j • If a part of xo often occurs by itself without 
the rest of xo occurring then this reduces the evidence for 
integration of O. This makes sense if we want to interpret 
the integrated spatiotemporal patterns as persistent objects 
like rocks, crystals, but also living organisms. If we con¬ 
sider for example a rock, the probability that a rock occurs at 
some point in time without a rock occurring at the previous 
and next time step in close vicinity is quite low, whereas the 
probability that where there was a rock before there will be 
a rock shortly after is quite high. In fact anytime that a spa¬ 
tial pattern (a time slice of a spatiotemporal pattern) causes 
(in an intuitive sense) another spatial pattern at the next time 
step their joint probability will rise and especially if the first 
spatial pattern is among the only causes of the second, their 
evidence for integration will be high. 


Integrated spatiotemporal patterns and the 
tracking problem 

The spatiotemporal patterns can solve the tracking problem. 
To see this take the perspective of time slices and say that at 
some time t a living organism is a configuration of degrees 
of freedom which increases the probability of a particular 
configuration of other degrees of freedom at a subsequent 
time t+e that is again a living organism. More specifically, a 
living organism will lack certain molecules before absorbing 
them, conversely there will be a surplus of other molecules 
before they are ejected from the living organism. Therefore 
the probability for molecular exchange will be higher than 
for maintaining the same composition. This means the spa¬ 
tiotemporal patterns traversing the degrees of freedom as¬ 
sociated to the molecules will have a higher evidence for 
integration over time. Similarly, in the field-theoretic setting 
the field configuration represented by the spatial pattern will 
increase the probability of the neighboring degrees of free¬ 
dom to assume a certain configuration. This leads to more 
evidence for the integration of the moving pattern. 

With respect to the problem of counterfactual variation we 
can see the following. Integration is calculated directly for 
spatiotemporal patterns within a trajectory and the local mu¬ 
tual information vanishes for all spatiotemporal patterns that 
do not occur in this trajectory (see Eq. 4). Then, if the spa¬ 
tiotemporal patterns that occur in different trajectories are 
different, the integrated spatiotemporal patterns will also be 
different. Thus, if integrated spatiotemporal patterns repre¬ 
sent agents, these can occur on some degrees of freedom in 
one trajectory and not occur on those in another. This means 
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counterfactual variation won’t be a problem. 

Experimental indications 

We present here the results of three preliminary experiments. 
The first is conceived to hint at the kind of trajectories 
that show high evidence of integration. The second experi¬ 
ment suggests that traversal of degrees of freedom or motil¬ 
ity/metabolism can at least in principle be detected by inte¬ 
gration. Similarly the third experiment shows that in princi¬ 
ple counterfactual variation is no obstacle for integration. 

All experiments use a 4 x 4 grid with game-of-life dy¬ 
namics and toroidal boundary conditions as the distributed 
dynamical system. As the initial distribution py 0 we use the 
uniform distribution in order to explore the whole range of 
possible trajectories. We investigate only patterns covering 
three time steps t = 8,9,10 and thereby neglect a lot of 
transient patterns that are difficult to interpret. In principle, 
however, our method does apply to transient patterns as well. 
Instead of integration we calculated only the evidence for 
integration with respect to the finest possible partition (EV- 
IFPP). The finest possible partition of a set A of nodes is just 
the partition where each block is a set containing exactly one 
node in A. A positive EVIFPP is a necessary condition for 
integration and therefore a crude indication for it. 

For the first experiment we looked at all trajectories that 
differ at time steps t = 8,9,10 (a lot of trajectories end up 
with all cells white at those times). For each of those tra¬ 
jectories we calculated the EVIFPP for the spatiotemporal 
pattern xo = (xy 8 ,xy Q ,xy 10 ). So the time slices of xo are 
global states in this case. Since the xo are global states this 
is more an evaluation of the integration of the trajectories 
that result from the different initial conditions. In Fig. 1 five 
such different global three-time-step patterns with high val¬ 
ues of integration are shown including the completely blank 
spatiotemporal pattern and the spatiotemporal patterns (ig¬ 
noring symmetric versions) with the highest EVIFPP. We 
can see that the blank spatiotemporal pattern has positive 
but much lower EVIFPP than some other patterns. For the 
second experiment we chose a specific trajectory shown in 
the first row in Fig. 2 which exhibits a moving pattern and 
searched through all patterns covering time steps t = 8,9,10 
and fixing n = 14 cells (i.e. nodes of the dynamic Bayesian 
network) in each time slice xo t - We can see that the de¬ 
grees of freedom (i.e. the cells or nodes) making up both the 
spatiotemporal pattern with minimal EVIFPP (second row 
in Fig. 2) as well as that with maximal EVIFPP (third row 
in Fig. 2) vary over the three time steps and adapt to the 
configuration of the global state. Note that the patterns with 
minimal and maximal EVIFPP are not unique. 

For the third experiment changed the initial condition of 
the trajectory by shifting all values of the initial condition 
of the second experiment “down” one cell. This results in a 
different trajectory shown in the first row of Fig. 3. We then 
evaluated the spatiotemporal pattern that results from fixing 




Figure 1: Three three-time step spatiotemporal patterns. 
Each row shows the three global spatial patterns that make 
up the spatiotemporal pattern xo • The first row shows the 
blank spatiotemporal pattern and the others show the two 
patterns with the highest EVIFPP. The EVIFPP values are 
(from top to bottom) 4.9, 81.9, and 85.4 respectively. 

the same nodes as in the spatiotemporal pattern with maxi¬ 
mal EVIFPP found in the second experiment on the changed 
trajectory (see row two in Fig. 3). We also evaluated the EV¬ 
IFPP of the spatiotemporal pattern that results from shifting 
the fixed nodes of maximal EVIFPP pattern in the same way 
as the initial condition (see row three in Fig. 3). The pattern 
with the same fixed nodes as the pattern that formerly had 
maximal EVIFPP now has lower EVIFPP than the pattern 
with the nodes adapted to the new initial condition. 

Discussion 

The first experiment shows that the completely blank trajec¬ 
tory has low spatiotemporal EVIFPP and that more “inter¬ 
esting” trajectories have higher EVIFPP (Fig. 1). This can 
also be done with other methods e.g. counting black cells. 
However, our method is general and doesn’t use any prior 
knowledge e.g. which color of cells to count. For us this 
result is a necessary condition for further investigation. 

The second experiment shows that the degrees of free¬ 
dom pertaining to spatiotemporal patterns with high EV¬ 
IFPP adapt over time to the changing configurations of the 
system. This shows that EVIFPP is capable of solving the 
metabolism and motility problems. We expect that the same 
holds true for evidence of integration with respect to any 
partition and therefore also for integration itself. 

The third experiment shows that under a variation of the 
initial condition the degrees of freedom pertaining to spa- 
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Figure 2: A three-time step part of a trajectory (can also 
be seen as a global spatiotemporal pattern) in the first row 
and two local spatiotemporal patterns on this trajectory in 
the second and third row. Both spatiotemporal patterns in 
rows two and three have n = 14 specified cells per time 
slice. The second (third) row shows a pattern attaining the 
minimal (maximal) EVIFPP of 32.5 (54.4) among all pat¬ 
terns with n = 14 on the trajectory of row one. The global 
spatiotemporal pattern of row one has EVIFPP of 55.0. 

tiotemporal patterns with high EVIFPP change accordingly. 
Since the different trajectories generated from changed ini¬ 
tial conditions correspond to counterfactual histories this 
shows that the EVIFPP solves the problem of counterfactual 
variation. Again we expect this to carry over to integration. 

We note that larger grids become hard to evaluate compu¬ 
tationally very fast. For square grids the size of the Markov 
2 

matrix grows with 2 a where a is the number of rows and 
columns of the grid. We also note that due to the very lim¬ 
ited grid size we are studying any pair of cells is just sep¬ 
arated by maximally one neighborhood cell. This leads to 
strong dependencies which might make it irrelevant to place 
unspecified cells around patterns like the blinker (as for ex¬ 
ample suggested by Beer (2014a)). We had hoped to reveal 
such well known patterns and their extensions. Turning to 
larger grids is a next step in our research. 

Conclusion 

We have presented our current approach to representing 
agents in dynamical systems. Three criteria that we expect 
from such an agent representation were motivated with a 
thought experiment involving a dynamical systems model 
of the biosphere. The literature was reviewed in the light 
of these criteria. We also introduced our current candidate 


measure for identifying intrinsic spatiotemporal patterns in 
dynamical Bayesian networks. These patterns form the ba¬ 
sic building blocks of our approach to representing agents. 
We argued that this approach can deal with the three crite¬ 
ria for agent representations that we have put forward. Ex¬ 
perimentally we verified this for a crude approximation to 
our more involved concept of integration. However experi¬ 
mental results are currently inconclusive with respect to the 
tracking of structures that are actually relevant for agents. 
Therefore we see the value of this work mostly as a contri¬ 
bution to the discussion of the foundations of artificial life. 
Future work will bring more decisive results. 



Figure 3: The three-time step part of the trajectory that re¬ 
sults from shifting the values of the initial condition of the 
trajectory in the first row of Fig. 2 “down” by one cell. The 
second row shows the spatiotemporal pattern with the same 
fixed nodes as the pattern that had maximal EVIFPP on the 
trajectory of Fig. 2 but now on the shifted trajectory. The 
EVIFPP of this is 39.8. The third row shows the spatiotem¬ 
poral pattern with the fixed nodes shifted “down” in the same 
way as the initial condition. This pattern has EVIFPP of 54.4 
and is the maximal EVIFPP for patterns with n = 2. As ex¬ 
pected this is the same value we found for the pattern with 
the non-shifted nodes on the non-shifted trajectory. 
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Abstract 

We apply formal information measures of emergence, self¬ 
organization and complexity to scale-free random networks, 
to explore their association with structural indicators of net¬ 
work topology. Results show that the cumulative number 
of nodes and edges coincides with an increment of the self¬ 
organization and relative complexity, and a loss of the emer¬ 
gence and complexity. Our approach shows a complementary 
way of studying networks in terms of information. 

Introduction 

Among representative structural properties of networks we 
can list: the degree of nodes and their distribution, the clus¬ 
tering coefficient, and the average path length. The degree is 
informative of how many nodes are connected to each other. 
The clustering coefficient is a measure of the number of tri¬ 
angles in a graph. The average path length is the average 
number of steps along the shortest paths between all possi¬ 
ble pairs of network nodes (Newman et al., 2006). In spite 
of the value of these measures to characterize some complex 
networks, measuring complexity in networks is desirable. 
Recently, measures of emergence, self-organization, com¬ 
plexity, and relative complexity based on information theory 
have been developed and their usefulness can be evaluated 
(Fernandez et al., 2014). 

In this paper we analyze the association of topological 
structural indicators like the number of nodes, clustering co¬ 
efficient and average path length with formal measures of 
emergence, self-organization, complexity, relative complex¬ 
ity to scale-free random networks. 

In the next section we present the methods for generating 
networks and the formalism to measure complexity. In sec¬ 
tion 3, we briefly present and discuss our results obtained 
from the applications of multivariated machine learning un¬ 
supervised techniques. Section 4 presents conclusions and 
future work. 

Methods 

Using the Barabasi-Albert model implemented in SocNetV 
software (Kalamaras D., 2015), ten random networks of the 
following number of nodes were generated: 5, 10, 15, 20, 


25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 
200, 225, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 
1000, 2000 and 3000. In total, 310 scale-free networks were 
created. First, structural properties of clustering coefficient 
and average path length were calculated. Then, information 
measures of emergence ( E ), self-organization (S'), complex¬ 
ity (C), and relative complexity ( R ) were applied to the vec¬ 
tor obtained from the horizontal sum of the grade of each 
node, on the adjacency matrix. Summarizing, E is equiv¬ 
alent to Shannon information (Shannon, 1948), depending 
of the probabilities pi for all i symbols in a finite alphabet: 
I = — Y^i=iPi^°&Pi- I n ^is work, we use log 10 . Based 
on this equation we define that E = I , S = 1 — E, and 
C = 4 x E x S. Relative Complexity R = CNi/CN T 
where CNi was the complexity of the network i and CN T 
was the average complexity of the other networks. Since 
E,S,C G [0,1] a numerical, color and category scale has 
been defined for a better interpretation. The ranges are 
[0.8,1], [0.6,0.8), [0.4,0.6), [0.2,0.4),[0, 0.2). The corre¬ 
sponding colors are: blue, green, yellow, orange, and red. 
The matching categories are: very high, high, fair, low, and 
very low. R just has two colors of codification: blue if R > 1 
and red if R < 1. 

To facilitate the visualization of the relationship of com¬ 
plexity properties and network structure descriptors, accord¬ 
ing to the increment of nodes and edges, multivariated tech¬ 
niques was carried out. Consequently, a principal compo¬ 
nent analysis ( PCA ) was integrated with a hierarchical clus¬ 
ter analysis ( HCPC ). PCA was used to summarize and to vi¬ 
sualize the information contained in structural attributes and 
complexity properties. HCPC was used for identifying clus¬ 
ters of networks with similar characteristics. Also, statistical 
indicators such as v-test (a criterion of normal distribution), 
mean in cluster, overal average, and p-values were estimated 
to associated clusters and properties (Le and Worch, 2015). 

Results and Discussion 

The PCA depicts the relationship, variation, and patterns 
among structural and complexity properties (fig. 1). A high 
percentage of the explained variance of data was captured 
in the first two axes (91.46%). In consideration of the po- 
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sition of variables in the multidimensional space, it is easy 
to see that self-organization and relative complexity are pos¬ 
itively correlated with the increasing of nodes and edges. 
Meanwhile, when nodes, edges, self-organization, and rela¬ 
tive complexity increase, the complexity and emergence of 
the network decreases. These two groups, in spite of the fact 
that they have an opposite behavior, are very close to the 
first component (Dim 1) and explain the variance of scale- 
free networks in 74.24%. As a relevant fact, it is possible 
to see that complexity is more related to the change (emer¬ 
gence) in the system than its regularity (self-organization). 
This suggest that some adaptability of scale-free networks 
could be related to high variability with a minor proportion 
of uniformity in the degree distribution of the nodes. 

Considering the structural indicators of clustering coeffi¬ 
cient and average path length, we can observe that they are 
opposite as it has been noticed in the literature. They are 
associated with the second component and represent a mi¬ 
nor variance explained of the dataset (12.22%). As the form 
of clustering coefficient and average path length are posi¬ 
tioned, we cannot establish any relation between them and 
emergence, self-organization, complexity, and relative com¬ 
plexity. 



Dim 1 (74.24%) 

Figure 1: Principal component analysis of complexity and struc¬ 
tural properties in 310 simulations of random scale-free networks 
obtained using BA model 

Regarding the incremental number of nodes, HCPC anal¬ 
ysis allows the statistical creation and characterization of 
five clusters (table 1). The first cluster included networks 
with just five nodes. These systems are related statistically 
with the clustering coefficient, because of a relatively high 
density of ties in small networks. Cluster two groups net¬ 
works with 10 to 50 nodes, indicating that they are the most 
emergent and complex of all. Indeed, in cluster two, E 
reached the fair category (yellow), in comparison with the 


low category of overall (in orange). C was classified as 
very high (blue), meanwhile overall was categorized as high 
(green). Besides, we can note that these small networks have 
a negative relationship with the average path length which 
is less than overall. Networks between 60-175 nodes were 
grouped in cluster 3. They also have a very high complexity. 
Cluster four included networks with 100-600 nodes and are 
the most self-organized due to having a very high S. Finally, 
networks with the highest number of nodes (700-3000) gain 
some relative complexity. That means the increase of the 
number of nodes and edges resulting in networks moderately 
more complex. 

Table 1: Statistical description for Clusters and Structural Proper¬ 
ties in Scale-free Networks. 


Cluster 

Number of Nodes in Property 
Network Grouped Associ¬ 

ated 

V.Test 

Property fi Property 
in cluster Overall // 

P- 

value/Significance 

1 

5 

c.c. 

4.808 

0.213 

0.019 

1.521 06 *** 


10,15,20,25,30 

E 

4.316 

0.545 

0.304 

| 1.586e U5 *** 

2 

35,40,45,50 

C 

3.177 

0.949 

0.676 

| 1.586“ Ub *** 



Av.P.Length. -3.165 

2.283 

2.619 

1.553 U3 

3 

60,70,80,90,100 

C 

2.116 

0.873 0.676 

0.034* 

125,150,175 






4 

200,225,250,300,350 S 

2.928 

0.872 




400,450,500,600 






5 

700,800,900 

R 

4.358 


1.003 


1000,2000,3000 

S 

3.082 

0.945 

yf 0.696 

| 2.049 ua *** 


Final Remarks 

Our first results are encouraging. It was interesting to find 
that growth in random scale-free graphs implies more self¬ 
organization and relative complexity. The relative complex¬ 
ity could be useful to analyze cases when two or more net¬ 
works interact. Thus, the gain of self-organization and rela¬ 
tive complexity could in time be a useful characteristic to 
regulate feedback and guide the management of complex 
networks. 

Further work is required. We are planning to broaden our 
explorations and perform further analysis to understand and 
clarify the relationship between complexity properties and 
structural indicators in random networks, and other complex 
topologies. 
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